nach oben

Complex & Intelligent Systems

Erschienen in:

Open Access 13.10.2022 | Original Article

Distribution matching and structure preservation for domain adaptation

verfasst von: Ping Li, Zhiwei Ni, Xuhui Zhu, Juan Song

Erschienen in: Complex & Intelligent Systems | Ausgabe 2/2023

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Cross-domain classification refers to completing the corresponding classification task in a target domain which lacks label information, by exploring useful knowledge in a related source domain but with different data distribution. Domain adaptation can deal with such cross-domain classification, by reducing divergence of domains and transferring the relevant knowledge from the source to the target. To mine the discriminant information of the source domain samples and the geometric structure information of domains, and thus improve domain adaptation performance, this paper proposes a novel method involving distribution matching and structure preservation for domain adaptation (DMSP). First, it aligns the subspaces of the source domain and target domain on the Grassmann manifold; and learns the non-distorted embedded feature representations of the two domains. Second, in this embedded feature space, the empirical structure risk minimization method with distribution adaptation regularization and intra-domain graph regularization is used to learn an adaptive classifier, further adapting the source and target domains. Finally, we perform extensive experiments on widely used cross-domain classification datasets to validate the superiority of DMSP. The average classification accuracy of DMSP on these datasets is the highest compared with several state-of-the-art domain adaptation methods.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Traditional machine learning methods usually need to meet two assumptions when dealing with classification problems. First, the training set and the testing set are independent and identically distributed. Second, there should be large amounts of labeled training samples. However, it is usually very expensive and time-consuming to collect labeled samples in real-word applications such as medical image analysis and autonomous vehicles [1]. It is, therefore, necessary to leverage other related data to assist in completing corresponding classification tasks. Cross-domain classification refers to classifying the samples from a target domain with the help of the labeled samples from a related but different source domain, where the source domain has rich label information but the target domain lacks label information [2]. As is shown in Fig. 1, cross-domain classification covers various fields, such as object recognition between different domains sampled from different illustrations, background, visual angles; ship classification between different domains sampled from real ships and synthetic ships, respectively; histopathological image classification between different domains sampled from different medical institutions. Due to the distribution difference across domains, a classifier trained on the source domain usually has poor performance on the target domain. Therefore, cross-domain classification remains a challenging problem.

Domain adaptation (DA) can deal with such cross-domain classification problems where the marginal distribution or conditional distribution of the source samples and the target samples are different. It uses useful knowledge in a source domain with large-scale labeled samples to assist in completing the classification task in the target domain, having been applied to practical classification problems [3‐5]. DA usually assumes that source and target domains share the same label space, aiming to explore useful knowledge in the source domain to classify the target data. DA mainly contains supervised DA and unsupervised DA, of which unsupervised DA is more challenging in practical applications since it has no label information in the target domain. In this paper, our principal focus is on unsupervised DA.

DA methods can be divided into two categories: deep DA methods, and non-deep DA methods, depending on whether they learn the deep adaptive features or not. Deep DA methods can directly work on original image data, using deep neural network to perform end-to-end domain adaptation. However, this kind of methods needs to retrain deep network parameters; the training is not stable enough, and the tuning process is also complex. Non-deep DA methods mainly include instance re-weighting methods [6‐8], feature adaptation methods [9‐12] and adaptive classifier learning methods [13‐15]. Significantly, non-deep DA methods can also act on the deep features, using labeled samples in the source domain to learn classifiers that generalize well to the target domain [16].

The usual strategy adopted by instance re-weighting methods is to weight or resample the samples to assist the target domain in classification task. However, this kind of methods is not suitable for cross-domain classification where domain divergences in conditional distribution exist. Feature adaptation methods learn new feature representations of different domains by aligning their subspaces or minimizing their predefined distribution distance. In general, feature adaptation methods can only reduce, but cannot remove domain divergence. Moreover, feature adaptation methods need to use traditional classification methods to train a classifier on source labeled samples while the divergence will affect the performance of the classifier on the target domain. Adaptive classifier learning methods directly train an adaptive classifier by jointly minimizing the classification function and the distribution difference between the source and target domains. According to DA-related theories [17], the classification accuracy of the learned classifier on the target domain is affected to a certain extent by factors such as the distribution difference between the domains, and its empirical error in the source domain.

However, adaptive classifier learning methods usually ignore the inter-class difference and intra-domain local structure [15]. Actually, the inter-class difference in the source domain has an important impact on the discriminant structure of the classifier, and the adjacent samples in a same domain usually belong to the same category [18]. Therefore, in this paper, we propose a new DA method, namely distribution matching and structure preservation for domain adaptation (DMSP). It learns the adaptive classifier by simultaneously minimizing the intra-domain graph regularization, the structural risk function, and the distribution adaptation regularization related to class-wise distribution distance between domains and inter-class distance in the source domain. In addition, adaptive classifier learning methods are usually designed for the original feature space, where feature distortion will negatively affect the DA performance [19]. To address this problem, DMSP learns the adaptive classifier on the Grassmann manifold (GM) where the feature distortion can be avoided.

The contributions of this paper are as follows:

(1) We propose intra-domain graph regularization to preserve the respective local structures of the source and target domains, which is helpful to learn a discriminative classifier.

(2) Our DMSP jointly minimizes the structural risk function, the distribution adaptation regularization and the intra-domain graph regularization, to match the distributions of the source and target domains and obtain an adaptive classifier which is robust and discriminative to target data. Furthermore, the classifier is learned on the Grassman manifold where feature distortion can be avoided, so that its performance can be improved.

(3) We conduct comprehensive experiments on several cross-domain image datasets. Experimental results verify the effectiveness and superiority performance of our method.

In this section, we review some previous works related to our proposed method in terms of distribution matching, adaptive classifier learning and local structure preservation.

Distribution matching

Distribution matching aims to reduce the distribution difference between the source and target domains. Maximum mean discrepancy (MMD) [20] is often used to measure the distribution difference. Based on MMD, joint distribution adaptation (JDA) [21] minimizes the class-wise distribution distance between the source and target domains to match their marginal and conditional distribution difference. Furthermore, to mine discriminative information within the two domains, domain invariant and class discriminative feature learning (DICD) [22] also minimizes the intra-class scatter and maximizes the inter-class dispersion simultaneously, while unsupervised metric transfer learning method (UMTL) [23] also maximizes the inter-class distance.

These above methods conduct distribution matching in a principal dimension reduction procedure to exploit a shared feature space. However, they still need to utilize a traditional classification method for label prediction of the target data. Moreover, the cross-domain discrepancy still exists in the reduced latent feature space. Our DMSP minimizes the class-wise distribution distance and maximizes the inter-class distance simultaneously as in [23], but in a different way. It conducts distribution matching through a classifier learning procedure, which directly obtains an adaptive classifier for the target data.

Local structure preservation

Local structure preservation aims to preserve the local geometric structure of given data. Graph regularization is devoted to preserving the local structure which encourages the samples from the same category to be close with each other. In DA problems, [24, 25] minimize graph regularization concerning the whole of the source and target data to preserve their space relationship, where the graph regularization can be regarded as inter-domain graph regularization. [26, 27] minimize such inter-domain graph regularization to ensure that the inferred data labels comply with the local structure of the source and target data.

The methods mentioned above use the inter-domain graph regularization to preserve the local structure of the whole data in different domains (i.e., inter-domain local structure). However, the difference across domains might cause the local manifold structures of the source data and target data to be different. Specifically, the adjacent samples from different domains might belong to different categories. As a result, forcibly minimizing the inter-domain graph regularization will degrade the DA performance. This paper proposes intra-domain graph regularization concerning source and target data separately, which can preserve the respective local manifold structures of source data and target data.

Adaptive classifier learning

Adaptive classifier learning aims to directly obtain an adaptive classifier trained on source labeled samples that performs well on target data. Adaptation regularization based transfer learning (ARLT) [15] learns the adaptive classifier by jointly minimizing inter-domain graph regularization, the structural risk function, and class-wise distribution distance between domains. Clustering for domain adaptation (DAC) [28] further explores the cluster structure of target data. However, ARTL and DAC are designed only for the original feature where feature distortion will undermine the performance [19]. On the basis of ARTL, manifold dynamic distribution adaptation (MDDA) [29] learns the adaptive classifier on the Grassmann manifold (GM) to overcome feature distortion.

DMSP is also based on ARTL aiming to learn the adaptive classifier on the GM, but it is different from ARTL and MDDA. On the one hand, ARTL and MDDA utilize the inter-domain graph regularization to preserve the manifold consistency underlying the whole data, which will result in different inferred labels for adjacent samples due to different local manifold structures of the source data and the target data. In contrast, DMSP uses the intra-domain graph regularization to induce adjacent samples in the same domain to be inferred as the same label. On the other hand, ARTL and MDDA fail to consider the impact of discriminative information within domains on the discriminant structure of the adaptive classifier, while DMSP maximizes the inter-class distance to improve the discriminant structure.

Proposed method

In this section, we detail the proposed distribution matching and structure preservation for domain adaptation (DMSP) method.

Problem statement

Given the source domain $\mathcal{D}_{s} { = }\left\{ {{\mathbf{x}}_{i} ,y_{i} } \right\}_{i = 1}^{{n_{s} }}$ and the target domain $\mathcal{D}_{t} { = }\left\{ {{\mathbf{x}}_{j} } \right\}_{{j = n_{s} + 1}}^{{n_{s} + n_{t} }}$, where ${\mathbf{x}}_{i} \in {\mathbb{R}}^{1 \times m}$ is a source sample associated with its label $y_{i} \in \left\{ {1,2, \ldots ,Cl} \right\}$ and ${\mathbf{x}}_{j} \in {\mathbb{R}}^{1 \times m}$ is a target sample, we assume that the two domains have different distributions, but share the same label space and feature space. The goal of DMSP is to train an adaptive classifier on $\mathcal{D}_{s}$, which can perform well on $\mathcal{D}_{t}$ with low expectation error.

Problem formulation

DMSP works two steps. In the first step, DMSP adopts geodesic flow kernel (GFK) [30] to perform manifold feature learning on the Grassmann manifold (GM), which can avoid feature distortion in the original space and align the subspaces of the source and target domains. Specifically, the bases of the source and target subspaces are denoted as ${\mathbf{S}}_{1} \in {\mathbb{R}}^{m \times d}$ and ${\mathbf{S}}_{2} \in {\mathbb{R}}^{m \times d}$ respectively, where $d$ is the dimension of the low-dimensional linear subspace. Considering ${\mathbf{S}}_{1}$ and ${\mathbf{S}}_{2}$ as two points on the GM, we know from the literature [30] that $\forall {\mathbf{x}}_{i} ,{\mathbf{x}}_{j} \in {\mathbb{R}}^{m}$ in the original feature space can be projected into a manifold embedded feature space, denoted as ${\mathbf{z}}_{i} ,{\mathbf{z}}_{j}$, and their inner product is

$$ \left\langle {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right\rangle = \int_{0}^{1} {\left( {\Psi \left( t \right)^{{\text{T}}} {\mathbf{x}}_{i} } \right)^{{\text{T}}} \left( {\Psi \left( t \right)^{{\text{T}}} {\mathbf{x}}_{j} } \right){\text{d}}t = } {\mathbf{x}}_{i}^{{\text{T}}} {\mathbf{Gx}}_{j} , $$

(1)

where ${\mathbf{G}}$ is the geodesic flow kernel matrix, $\Psi \left( t \right)$ represents the geodesic flow from ${\mathbf{S}}_{1}$ to ${\mathbf{S}}_{2}$, $t \in \left[ {0, \, 1} \right]$, $\Psi \left( 0 \right) = {\mathbf{S}}_{1}$, $\Psi \left( 1 \right) = {\mathbf{S}}_{2}$. According to (1), the distance between samples on the GM is calculated, and then based on the labeled information of the source samples, the initial pseudo labels ${\overline{\mathbf{Y}}}_{t} = \left[ {\overline{y}_{{n_{s} + 1}} , \ldots ,\overline{y}_{{n_{s} + n_{t} }} } \right]$ of the target samples are obtained using the 1-nearest neighbor (1-NN). In addition, the manifold embedded feature representations of the source and target samples can be obtained in explicit form from [19], that is, ${\mathbf{z}}_{i} = \sqrt {\mathbf{G}} {\mathbf{x}}_{i} ,i = 1,2, \ldots ,n_{s} + n_{t}$.

In the second step, DMSP learns an adaptive classifier by simultaneously optimizing the structure risk function, inter-domain distribution matching and intra-domain local structure preservation. The object function of DMSP is formulated as

$$\begin{aligned} & \mathop {\min }\limits_{{f \in \mathcal{H}_{k} }} \, \sum\limits_{i = 1}^{{n_{s} }} {\left( {y_{i} - f\left( {{\mathbf{z}}_{i} } \right)} \right)^{2} } + \sigma \left\| f \right\|_{k}^{2} + \gamma M_{f,k} \left( {\mathcal{D}_{s} ,\mathcal{D}_{t} } \right)\\ & \quad + \lambda \left( D_{f,k} \left( {\mathcal{D}_{s} ,\mathcal{D}_{t} } \right) - \beta \overline{D}_{f,k} \left( {\mathcal{D}_{s} } \right)\right),\end{aligned} $$

(2)

where $k\left\langle { \cdot , \, \cdot } \right\rangle$ is the kernel function, $\mathcal{H}_{k}$ represents the corresponding Hilbert space, $\left\| f \right\|_{k}^{2}$ is the square norm of $f$, the shrinkage regularization parameter $\sigma > 0$, the intra-domain graph regularization parameter $\gamma > 0$, the distribution adaptation regularization parameter $\lambda > 0$, and the trade-off parameter $\beta > 0$.

The first two items in formula (1) are the structural risk minimization on the source domain. According to the representer theorem [31], the structural risk minimization can be reformulated as

$$\begin{aligned} & \sum\limits_{i = 1}^{{n_{s} }} {\left( {y_{i} - f\left( {{\mathbf{z}}_{i} } \right)} \right)^{2} } + \sigma \left\| f \right\|_{k}^{2}\\ & = \left\| {\left( {{\mathbf{Y}} - {{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{K}}} \right){\mathbf{E}}} \right\|_{F}^{2} + \sigma tr\left( {{{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{K}}}{\varvec{\upomega}} \right),\end{aligned} $$

(3)

where $\left\| \cdot \right\|_{F}^{2}$ represents the F-norm square, ${\mathbf{Y}} = \left[ {y_{1} , \ldots ,y_{{n_{s} }} ,\overline{y}_{{n_{s} + 1}} , \ldots ,\overline{y}_{{n_{s} + n_{t} }} } \right]$ is the label matrix, for multi-class problems, ${\mathbf{Y}} \in {\mathbb{R}}^{{Cl \times \left( {n_{s} + n_{t} } \right)}}$, ${{\varvec{\upomega}}} = \left[ {\omega_{1} ,\omega_{2} , \cdots ,\omega_{{n_{s} + n_{t} }} } \right]^{{\text{T}}}$ is the coefficients, ${\mathbf{K}} \in {\mathbb{R}}^{{\left( {n_{s} + n_{t} } \right) \times \left( {n_{s} + n_{t} } \right)}}$ is the kernel matrix, its elements are $\left( {\mathbf{K}} \right)_{ij} = k\left( {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right)$, ${\mathbf{E}} = {\text{diag}}\left( {E_{1} ,E_{2} , \ldots ,E_{{n_{s} + n_{t} }} } \right)$, when $1 \le i \le n_{s}$, $E_{i} = 1$, when $i \ge n_{s} + 1$, $E_{i} = 0$, matrix ${\mathbf{E}}$ can remove unreliable pseudo labels in the target domain.

The third item in formula (1) is the proposed intra-domain graph regularization, which aims to preserve the respective local structures of the source and target domains, and to induce adjacent samples in the same domain to be inferred as the same label. Let $\mathcal{N}_{p}^{s} \left( \cdot \right)$ and $\mathcal{N}_{p}^{t} \left( \cdot \right)$ be the sets of p-nearest neighbors in source and target domains, respectively. The intra-domain graph regularization is defined as

$$\begin{aligned} M_{f,k} \left( {\mathcal{D}_{s} ,\mathcal{D}_{t} } \right)& = \sum\limits_{i,j = 1}^{{n_{s} }} {\left( {f\left( {{\mathbf{z}}_{i} } \right) - f\left( {{\mathbf{z}}_{j} } \right)} \right)^{2} \left( {{\mathbf{W}}_{s} } \right)_{i,j} }\\ &\quad + \sum\limits_{{i,j = n_{s} + 1}}^{{n_{s} + n_{t} }} {\left( {f\left( {{\mathbf{z}}_{i} } \right) - f\left( {{\mathbf{z}}_{j} } \right)} \right)^{2} \left( {{\mathbf{W}}_{t} } \right)_{i,j}},\end{aligned} $$

(4)

where

$$\begin{aligned} \left( {{\mathbf{W}}_{s} } \right)_{i,j} = \left\{ {\begin{array}{lllll} {\cos \left( {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right),} & {\left( {{\mathbf{z}}_{i} \in \mathcal{N}_{p}^{s} \left( {{\mathbf{z}}_{j} } \right) \vee {\mathbf{z}}_{j} \in \mathcal{N}_{p}^{s} \left( {{\mathbf{z}}_{i} } \right)} \right)}\\ & {\wedge {\mathbf{z}}_{i} \in \mathcal{D}_{s} \wedge {\mathbf{z}}_{j} \in \mathcal{D}_{s} } \\ {0,} & {{\text{otherwise}}} \\ \end{array} } \right.,\end{aligned} $$

(5)

$$\begin{aligned} \left( {{\mathbf{W}}_{t} } \right)_{i,j} = \left\{ {\begin{array}{llll} {\cos \left( {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right),} & {\left( {{\mathbf{z}}_{i} \in \mathcal{N}_{p}^{t} \left( {{\mathbf{z}}_{j} } \right) \vee {\mathbf{z}}_{j} \in \mathcal{N}_{p}^{t} \left( {{\mathbf{z}}_{i} } \right)} \right)}\\ &{\wedge {\mathbf{z}}_{i} \in \mathcal{D}_{t} \wedge {\mathbf{z}}_{j} \in \mathcal{D}_{t} } \\ {0,} & {{\text{otherwise}}} \\ \end{array} } \right..\end{aligned} $$

(6)

Let ${\mathbf{W}} = {\mathbf{W}}_{s} + {\mathbf{W}}_{t}$, intra-domain graph Laplacian matrix ${\mathbf{L}} = {\mathbf{I}} - {\mathbf{B}}^{{{{ - 1} \mathord{\left/ {\vphantom {{ - 1} 2}} \right. \kern-\nulldelimiterspace} 2}}} {\mathbf{WB}}^{{{{ - 1} \mathord{\left/ {\vphantom {{ - 1} 2}} \right. \kern-\nulldelimiterspace} 2}}}$,${\mathbf{I}}$ is the identity matrix, ${\mathbf{B}} = {\text{diag}}\left( {B_{1} ,B_{2} , \ldots ,B_{{n_{s} + n_{t} }} } \right)$, $B_{i} = \sum\nolimits_{j = 1}^{{n_{s} + n_{t} }} {{\mathbf{W}}_{ij} }$. The intra-domain graph regularization can be expressed as

$$\begin{aligned} M_{f,k} \left( {\mathcal{D}_{s} ,\mathcal{D}_{t} } \right)&= \sum\limits_{i,j = 1}^{{n_{s} + n_{t} }} {\left( {f\left( {{\mathbf{z}}_{i} } \right) - f\left( {{\mathbf{z}}_{j} } \right)} \right)^{2} \left( {\mathbf{W}} \right)_{i,j} }\\ & {\text{ = tr}}\left( {{{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{KLK}}^{{\text{T}}} {{\varvec{\upomega}}}} \right).\end{aligned} $$

(7)

The fourth item in formula (1) is the inter-domain distribution adaptation regularization, which simultaneously minimizes the class-wise distribution distance between domains and maximizes the inter-class distance in the source domain as in [23], where the class-wise distribution distance is formulated as

$$\begin{aligned} D_{f,k} \left( {\mathcal{D}_{s} ,\mathcal{D}_{t} } \right)&= \left\| {\frac{{1}}{{n_{s} }}\sum\limits_{i = 1}^{{n_{s} }} {f\left( {{\mathbf{z}}_{i} } \right)} - \frac{{1}}{{n_{t} }}\sum\limits_{{j = n_{s} + 1}}^{{n_{s} + n_{t} }} {f\left( {{\mathbf{z}}_{j} } \right)} } \right\|_{\mathcal{H}}^{2}\\ & { + }\sum\limits_{c = 1}^{Cl} {\left\| {\frac{{1}}{{n_{s}^{\left( c \right)} }}\sum\limits_{{{\mathbf{z}}_{i} \in \mathcal{D}_{s}^{\left( c \right)} }} {f\left( {{\mathbf{z}}_{i} } \right)} - \frac{{1}}{{n_{t}^{\left( c \right)} }}\sum\limits_{{{\mathbf{z}}_{j} \in \mathcal{D}_{t}^{\left( c \right)} }} {f\left( {{\mathbf{z}}_{j} } \right)} } \right\|_{\mathcal{H}}^{2} },\end{aligned} $$

(8)

and the inter-class distance is

$$ \overline{D}_{f,k} \left( {\mathcal{D}_{s} } \right){ = }\sum\limits_{c = 1}^{Cl} {\left\| {\frac{{1}}{{n_{s}^{\left( c \right)} }}\sum\limits_{{{\mathbf{z}}_{i} \in \mathcal{D}_{s}^{\left( c \right)} }} {f\left( {{\mathbf{z}}_{i} } \right)} - \frac{{1}}{{n_{s}^{{\left( {\overline{c}} \right)}} }}\sum\limits_{{{\mathbf{z}}_{j} \notin \mathcal{D}_{s}^{\left( c \right)} }} {f\left( {{\mathbf{z}}_{j} } \right)} } \right\|_{\mathcal{H}}^{2} } . $$

(9)

Finally, the inter-domain distribution adaptation becomes

$$ D_{f,k} \left( {\mathcal{D}_{s} ,\mathcal{D}_{t} } \right) - \beta \overline{D}_{f,k} \left( {\mathcal{D}_{s} } \right) = \sum\limits_{c = 0}^{Cl} {tr\left( {{{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{KM}}_{c} {\mathbf{K}}^{{\text{T}}} {{\varvec{\upomega}}}} \right)} - \beta \sum\limits_{c = 1}^{Cl} {tr\left( {{{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{K\overline{M}}}_{c} {\mathbf{K}}^{{\text{T}}} {{\varvec{\upomega}}}} \right)} = tr\left( {{{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{K}}\left( {{\mathbf{M}} - \beta {\overline{\mathbf{M}}}} \right){\mathbf{K}}^{{\text{T}}} {{\varvec{\upomega}}}} \right). $$

(10)

$$ \left( {{\mathbf{M}}_{c} } \right)_{i,j} = \left\{ {\begin{array}{*{20}l} {\frac{1}{{n_{s}^{\left( c \right)} n_{s}^{\left( c \right)} }},} \hfill & {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} \in \mathcal{D}_{s}^{\left( c \right)} } \hfill \\ {\frac{1}{{n_{t}^{\left( c \right)} n_{t}^{\left( c \right)} }},} \hfill & {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} \in \mathcal{D}_{t}^{\left( c \right)} } \hfill \\ {\frac{ - 1}{{n_{s}^{\left( c \right)} n_{t}^{\left( c \right)} }},} \hfill & {\left\{ {\begin{array}{*{20}c} {{\mathbf{z}}_{i} \in \mathcal{D}_{s}^{\left( c \right)} \wedge {\mathbf{z}}_{j} \in \mathcal{D}_{t}^{\left( c \right)} } \\ {{\mathbf{z}}_{i} \in \mathcal{D}_{t}^{\left( c \right)} \wedge {\mathbf{z}}_{j} \in \mathcal{D}_{s}^{\left( c \right)} } \\ \end{array} } \right.} \hfill \\ {0,} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right., $$

(11)

$$ \left( {{\overline{\mathbf{M}}}_{c} } \right)_{i,j} = \left\{ {\begin{array}{*{20}l} {\frac{1}{{n_{s}^{\left( c \right)} n_{s}^{\left( c \right)} }},} \hfill & {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} \in \mathcal{D}_{s}^{\left( c \right)} } \hfill \\ {\frac{1}{{n_{s}^{{\left( {\overline{c}} \right)}} n_{s}^{{\left( {\overline{c}} \right)}} }},} \hfill & {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} \in \mathcal{D}_{s}^{{\left( {\overline{c}} \right)}} } \hfill \\ {\frac{ - 1}{{n_{s}^{\left( c \right)} n_{s}^{{\left( {\overline{c}} \right)}} }},} \hfill & {\left\{ {\begin{array}{*{20}c} {{\mathbf{z}}_{i} \in \mathcal{D}_{s}^{\left( c \right)} \wedge {\mathbf{z}}_{j} \in \mathcal{D}_{s}^{{\left( {\overline{c}} \right)}} } \\ {{\mathbf{z}}_{i} \in \mathcal{D}_{s}^{{\left( {\overline{c}} \right)}} \wedge {\mathbf{z}}_{j} \in \mathcal{D}_{s}^{\left( c \right)} } \\ \end{array} } \right.} \hfill \\ {0,} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right., $$

(12)

where $\mathcal{D}_{t}^{\left( 0 \right)} = \mathcal{D}_{t}$, $\mathcal{D}_{s}^{\left( 0 \right)} = \mathcal{D}_{s}$, $\mathcal{D}_{t}^{\left( c \right)} = \left\{ {{\mathbf{z}}_{j} :{\mathbf{z}}_{j} \in \mathcal{D}_{t} \wedge \overline{y}_{j} = c \ne 0} \right\}$, $\mathcal{D}_{s}^{\left( c \right)} = \left\{ {{\mathbf{z}}_{i} :{\mathbf{z}}_{i} \in \mathcal{D}_{s} \wedge y_{i} = c \ne 0} \right\}$, $n_{s}^{\left( c \right)} = \left| {\mathcal{D}_{s}^{\left( c \right)} } \right|$, $n_{t}^{\left( c \right)} = \left| {\mathcal{D}_{t}^{\left( c \right)} } \right|$, $n_{s}^{{\left( {\overline{c}} \right)}} = n_{s} - n_{s}^{\left( c \right)}$, ${\mathbf{M}} = \sum\nolimits_{c = 0}^{Cl} {{\mathbf{M}}_{c} }$, ${\overline{\mathbf{M}}} = \sum\nolimits_{c = 1}^{Cl} {{\overline{\mathbf{M}}}_{c} }$.

Problem solving and algorithm description

Substituting with formulas (3), (7) and (10), the object function of DMSP in formula (2) can be reformulated as

$$ \mathop {\min }\limits_{{{\varvec{\upomega}}}} \, L\left( {{\varvec{\upomega}}} \right){ = }tr\left( {\left( {{\mathbf{Y}} - {{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{K}}} \right){\mathbf{E}}\left( {{\mathbf{Y}} - {{\varvec{\upomega}}}^{{\text{T}}} {\mathbf{K}}} \right)^{{\text{T}}} } \right) + tr\left( {{{\varvec{\upomega}}}^{{\text{T}}} \left( {\sigma {\mathbf{K}} + {\mathbf{K}}\left( {\lambda \left( {{\mathbf{M}} - \beta {\overline{\mathbf{M}}}} \right) + \gamma {\mathbf{L}}} \right){\mathbf{K}}^{{\text{T}}} } \right){{\varvec{\upomega}}}} \right). $$

(13)

Setting ${{\partial L\left( {{\varvec{\upomega}}} \right)} \mathord{\left/ {\vphantom {{\partial L\left( {{\varvec{\upomega}}} \right)} {\partial {{\varvec{\upomega}}}}}} \right. \kern-\nulldelimiterspace} {\partial {{\varvec{\upomega}}}}} = {\mathbf{0}}$, we obtain the solution

$$ {{\varvec{\upomega}}} = \left( {\sigma {\mathbf{I}} + \left( {{\mathbf{E}} + \lambda \left( {{\mathbf{M}} - \beta {\overline{\mathbf{M}}}} \right) + \gamma {\mathbf{L}}} \right){\mathbf{K}}} \right)^{ - 1} {\mathbf{EY}}^{{\text{T}}} . $$

(14)

Then, we can obtain the adaptive classifier $f\left( {\mathbf{z}} \right) = \sum\nolimits_{i = 1}^{{n_{s} + n_{t} }} {\omega_{i} k\left( {{\mathbf{z}}_{i} ,{\mathbf{z}}} \right)}$. Like ARTL [15], DMSP iteratively updates the target pseudo labels, eventually optimizing itself.

Finally, DMSP is summarized in Algorithm 1. In addition, a chart showing the architecture of DMSP is given in Fig. 2.

Experimental analysis

In this section, we evaluate the performance of DMSP through extensive experiments on widely used cross-domain image datasets.

Dataset descriptions

This paper uses public cross-domain classification datasets for experiments, including Office + Caltech, ImageCLEF -DA and Office-31. Office + Caltech dataset [30] consists of 4 domains, which are from Amazon, Webcam, DSLR and Caltech-256, respectively. These 4 domains are abbreviated as A10, W10, D10 and C10, with 10 common categories. Accordingly, 12 cross-domain classification tasks can be formed, i.e., A10 → C10, W10 → D10, etc. Note that $\mathcal{D}_{s}$ → $\mathcal{D}_{t}$ represents cross-domain classification task from $\mathcal{D}_{s}$ to $\mathcal{D}_{t}$.

Office-31 [32] consists of three domains: Amazon, Webcam and DSLR. These three domains are abbreviated as A31, W31 and D31, with 31 shared categories. Similarly, we can construct 6 tasks. ImageCLEF-DA is composed of three domains, which are, respectively, from Caltech-256, ImageNet ILSVRC 2012 and Pascal VOC 2012. These three domains are abbreviated as P12, I12, C12, with 12 common categories, and 6 tasks can be formed. For Office + Caltech, the SURF [30] features and Decaf6 [33] features are used for experiments. For ImageCLEF-DA and Office-31, we adopt the Resnet-50 features, extracted from the Resnet-50 model [34]. Figure 3. shows some exemplary images from these datasets and Table 1 lists the descriptions of these datasets.

Table 1

Datasets descriptions

Datasets	Domains	#Classes	#Features	#Samples
Office + Caltech (SURF)	A10	10	256	958
	W10	10	256	295
	D10	10	256	157
	C10	10	256	1123
Office + Caltech (Decaf6)	A10	10	4096	958
	W10	10	4096	295
	D10	10	4096	157
	C10	10	4096	1123
ImageCLEF-DA	P12	12	2048	600
	I12	12	2048	600
	C12	12	2048	600
Office-31	A31	31	2048	2817
	W31	31	2048	498
	D31	31	2048	795

Experimental setup

We compare DMSP against the no adaptation baseline (i.e., 1-NN) and the state-of-the-art DA methods such as feature adaptation methods (i.e., transfer component analysis (TCA) [10], GFK [30], subspace alignment (SA) [12], JDA [21], group-lasso regularized optimal transport (OT-GL) [11], joint geometrical and statistical alignment (JGSA) [35], UTML [23], structure preservation and distribution alignment (SPDA) [24]), adaptive classifier learning methods (i.e., DCA [28], ARTL [15], MDDA [29]), deep DA methods (i.e., deep adaptation networks (DAN) [36], domain-adversarial neural network (DANN) [37], joint adaptation network (JAN) [38], collaborative and adversarial network (CAN) [39], conditional domain-adversarial network (CDAN) [40], domain-adversarial residual-transfer (DART) [41], multi-representation adaptation Network (MRAN) [42], and discriminative manifold propagation (DMP) [43]).

The dimension $d$ of the low-dimensional subspace in DMSP method can be selected by the subspace disagreement measure proposed in [30]. In the comparative experiment, considering that the proposed method and ARTL share some of the same type parameters, for fair comparison, we uniformly set the corresponding parameters $\sigma = 0.1$, $\lambda = 10$, $p = 10$, $T = 10$, and the kernel function is set as Gaussian kernel function. In addition, for datasets using shallow features [i.e., Office + Caltech (SURF)], we set $\gamma = 0.1$, while for datasets using deep features (i.e., Office + Caltech (Decaf6), ImageCLEF-DA and Office-31), we set $\gamma = 1$. Furthermore, the trade-off parameter is set to $\beta = 0.1$ on all datasets. The parameters of other DA methods are selected based on experience, and the experimental results under the parameters used in the original literature are compared. The best result is adopted.

In our experiments, we use classification Accuracy on the target domain as the evaluation measurement, which is widely adopted in existing literature [15, 21, 24, 25, 28, 29]:

$$ {\text{Accuracy}} = \frac{{\left| {{\mathbf{x}}:{\mathbf{x}} \in \mathcal{D}_{t} \wedge f\left( {\mathbf{x}} \right) = y} \right|}}{{\left| {{\mathbf{x}}:{\mathbf{x}} \in \mathcal{D}_{t} } \right|}}, $$

(15)

where $f\left( {\mathbf{x}} \right)$ and $y$ are the prediction label and truth label of ${\mathbf{x}}$, respectively.

Experimental results

The classification accuracy results of Office + Caltech using SURF and Decaf6 features are reported in Tables 2 and 3, respectively. We observe that the classification accuracy of no adaptation baseline (i.e., 1-NN) on Office + Caltech (SURF) and Office + Caltech (Decaf6) is lower than all the DA methods. Therefore, whether we use the shallow features or deep features, DA is necessary for cross-domain classification. Compared with those adaptive classifier learning methods (i.e., DCA, ARTL and MDDA), our DMSP (also an adaptive classifier learning method) preserves the intra-domain local structure and enhances the discriminant structure of the classifier, thus outperforming them on almost all the tasks. Another observation is that DMSP achieves the highest average classification accuracy on both Office + Caltech (SURF) and Office + Caltech (Decaf6). Even though the Decaf6 features yield an obvious improvement over the SURF features, DMSP still improves the final classification significantly, by more than 1.8% compared with the best comparison method (MDDA).

Table 2

Accuracy (%) on Office + Caltech (SURF)

Tasks	1-NN	GFK	SA	TCA	JDA	OT-GL	JGSA	SPDA	DCA	ARTL	MDDA	DMSP
C10→A10	23.7	41.0	41.3	43.4	44.8	48.4	51.5	52.8	50.5	47.0	56.9	58.3
C10→W10	25.8	40.7	40.0	37.3	41.7	50.2	45.4	40.7	41.7	31.9	53.6	54.2
C10→D10	25.5	41.4	46.5	44.0	45.2	47.8	45.9	51.6	45.9	38.9	51.0	56.1
A10→C10	26.0	40.3	41.1	38.2	39.4	37.9	41.5	43.4	41.2	37.9	46.1	48.1
A10→W10	29.8	40.0	41.0	38.0	38.0	42.0	45.8	43.4	35.9	31.9	49.2	54.2
A10→D10	25.5	36.3	38.9	30.6	39.5	44.6	47.1	46.5	38.2	36.3	45.2	40.1
W10→C10	19.9	30.7	31.9	29.7	31.2	36.6	33.2	32.0	31.8	31.5	32.6	33.9
W10→A10	23.0	31.8	35.6	32.3	32.8	39.6	39.9	37.3	38.9	38.6	41.7	43.7
W10→D10	59.2	87.9	80.9	85.4	89.2	85.4	90.5	89.8	91.1	86.0	91.7	89.8
D10→C10	26.3	30.1	31.2	30.9	31.5	34.3	29.9	33.8	31.8	29.1	32.6	34.0
D10→A10	28.5	32.1	34.3	29.3	33.1	37.9	38.0	38.2	33.6	34.9	42.9	49.3
D10→W10	63.4	84.4	82.4	84.8	89.5	87.8	91.9	82.4	90.9	87.1	90.5	88.8
Average	31.4	44.7	45.4	43.7	46.3	49.4	50.1	49.3	47.6	44.3	52.8	54.2

The best results are marked in bold

Table 3

Accuracy (%) on Office + Caltech (Decaf6)

Tasks	1-NN	GFK	SA	TCA	JDA	OT-GL	JGSA	SPDA	DCA	ARTL	MDDA	DMSP
C10→A10	87.3	88.2	89.4	90.2	90.3	92.1	91.4	92.9	93.5	92.4	93.4	93.5
C10→W10	72.5	77.6	81.4	77.0	85.1	84.2	86.8	92.5	90.5	87.8	90.9	95.9
C10→D10	79.6	86.6	90.5	85.4	89.2	87.3	93.6	92.5	87.9	86.6	90.5	96.8
A10→C10	71.7	79.2	80.6	82.7	84.0	85.5	84.9	88.3	87.8	87.4	88.0	88.9
A10→W10	68.1	70.9	83.1	74.6	78.6	83.1	81.0	89.2	88.8	88.5	87.8	91.2
A10→D10	73.9	82.2	89.2	80.3	80.9	85.0	88.5	84.1	89.2	85.4	91.1	96.2
W10→C10	55.3	69.7	79.8	79.9	84.2	81.5	85.0	86.1	87.4	88.2	88.6	88.6
W10→A10	62.5	76.8	83.8	84.5	90.1	90.6	90.7	93.2	92.2	92.3	91.8	92.2
W10→D10	98.1	100.0	100.0	100.0	100.0	96.3	100.0	100.0	100.0	100.0	100.0	100.0
D10→C10	42.0	71.4	81.4	82.5	85.0	84.1	86.2	87.6	87.4	87.3	87.6	88.2
D10→A10	49.9	76.3	87.1	88.2	91.0	92.3	92.0	93.5	92.2	92.7	93.1	93.6
D10→W10	91.5	99.3	99.3	99.7	100.0	96.3	99.7	100.0	100.0	100	98.6	98.6
Average	71.0	81.5	87.1	85.4	88.2	88.2	90.0	91.7	91.4	90.7	91.8	93.6

The best results are marked in bold

From Tables 2 and 3 we also note that on the tasks of W10 → C10 (SURF) and D10 → C10 (SURF), DMSP dominate over other DA methods except OT-GL. A possible explanation is that there is great difference between domains, and DMSP preserves the respective local structure of each domain but it does not outright advantage over sample-based matching in OT-GL. For the tasks W10 → D10 and D10 → W10, DMSP is beaten by JDA, JGSA, DCA and MDDA as reported in Tables 2 and 3. This is because D10 and W10 are closest pair of domains and our DMSP preserving the intra-domain local structure has not superiority. The fact that D10 and W10 are closest pair of domains is clearly found from the 1-NN accuracies. Although DMSP does not achieve the best performance in all tasks, DMSP performs better on most tasks (17/24). Particularly, on C10 → D10, A10 → W10 (SURF), A10 → C10 (SURF), C10 → W10 (Decaf6) and A10 → D10 (Decaf6), our DMSP improves by around 5% over the best comparison.

The results of ImageCLEF-DA and Office-31 datasets are also reported in Table 4. It can be found that using Resnet-50 features, DMSP also outperforms the related methods of ARTL and MDDA. This is because DMSP preserves the intra-domain local structure and makes the inter-class centroids separable. However, for the tasks D31 → W31 and W31 → D31, DMSP does not produce improvements in the classification accuracy, and is even beaten by the baseline method 1-NN. The main reason is that the two domain are very close and the advantage of DMSP (e.g., intra-domain local structure preservation) is not fully realized, instead it causes a negative impact. We also note although ARTL, MDDA and DMSP are non-deep DA methods, they outperform the deep DA methods (e.g., DAN, DANN, JAN, CAN, DART and MRAN), which demonstrates the significance of non-deep DA methods. Moreover, our DMSP achieves the highest average classification accuracy (88.6%), which is 0.3% higher than the best comparison. Especially, on the tasks of A31 → D31, DMSP improves by up to nearly 3 points over the best comparison result.

Table 4

Accuracy (%) on Office-31 and ImageCLEF-DA

Tasks	1-NN	DANN	DAN	CAN	JAN	MRAN	CDAN	DART	DMP	ARTL	MDDA	DMSP
A31 → W31	75.8	82.0	80.5	81.5	85.4	91.4	93.1	87.3	93.0	88.6	86.0	90.4
D31 → W31	96.0	96.9	97.1	98.2	97.4	96.9	98.6	98.4	99.0	97.9	97.1	96.2
W31 → D31	99.3	99.1	99.6	99.7	99.8	99.8	100.0	99.9	100.0	99.8	99.2	98.6
A31 → D31	79.1	79.7	78.6	85.5	84.7	86.4	92.9	91.6	91.0	84.7	86.3	94.6
D31 → A31	60.2	68.2	63.6	65.9	68.6	68.3	71.0	70.3	71.4	73.1	72.1	73.0
W31 → A31	59.9	67.4	62.8	63.4	70.0	70.9	69.3	69.7	70.2	73.8	73.2	73.9
I12 → P12	74.8	75.0	74.5	78.2	76.8	78.8	78.3	78.3	80.7	79.0	79.8	79.2
P12 → I12	74.0	86.0	82.2	87.5	88.0	91.7	91.2	89.3	92.5	91.7	91.5	93.2
I12 → C12	89.0	96.2	92.8	94.2	94.7	95.0	96.7	95.3	97.2	95.7	95.7	96.0
C12 → I12	83.5	87.0	86.3	89.5	89.5	93.5	91.2	91.0	90.5	92.0	92.0	93.2
C12 → P12	71.3	74.3	69.2	75.8	74.2	77.7	77.2	75.2	77.7	77.7	78.8	78.7
P12 → C12	76.2	91.5	89.8	89.2	93.5	95.3	93.7	93.1	96.2	95.2	95.5	95.7
Average	78.3	83.6	81.4	84.1	85.2	87.1	87.8	86.6	88.3	87.4	87.3	88.6

The best results are marked in bold

Effectiveness verification

DMSP mainly consists of following components on top of the structural risk minimization: the inter-domain distribution adaptation regularization with inter-class distance maximization added, the intra-domain graph regularization and the embedded features. We, thus, performed an ablation study to investigate these components, where we remove each component of DMSP separately and observe how it changes in performance. Four variants of DMSP are as follows: (a) DMSP-L₁, removes the inter-domain distribution adaptation regularization; (b) DMSP-L₂, removes the intra-domain graph regularization; (c) DMSP-L₃, removes the inter-class distance maximization; (d) DMSP-original, uses the original features instead of the embedded features to train classifier.

The classification accuracy results of the variants on several tasks are reported in Table 5. We see that DMSP outperforms all of the variants, suggesting that each component is important to DMSP. Expectedly, the overall classification performance of DMSP-L₁ is the worst compared with other variants. This is because distribution matching is essential to reduce the distribution difference between domains. DMSP-original is obviously inferior to DMSP on several tasks (e.g., C10 → A10 (SURF), A10 → W10 (SURF), and D10 → A10 (SURF), which indicates that embedded features learned on the Grassmann manifold (GM) can effectively alleviate feature distortion and achieve subspace alignment for those domain pairs from Office + Caltech (SURF). DMSP-L₂ degrades the classification performance on these tasks, indicating that it is necessary to preserve intra-domain local structure for DMSP. The inter-class distance maximization can separate the inter-class centroids so as to enhance the discriminant structure of the final classifier, therefore removing it will degrade the classification performance. DMSP-L₃ has lower classification accuracy compared with DSMP indicating the necessity of inter-class distance maximization.

Table 5

Accuracy (%) on cross-domain classification tasks

Tasks	DMSP-L₁	DMSP-L₂	DMSP-L₃	DMSP-original	DMSP
C10 → A10 (SURF)	54.8	58.1	57.9	49.9	58.3
A10 → W10 (SURF)	37.6	50.9	53.6	32.2	54.2
W10 → C10 (SURF)	31.3	33.7	33.8	32.2	33.9
D10 → A10 (SURF)	38.6	48.6	49.2	35.4	49.3
D10 → A10 (Decaf6)	92.2	92.7	93.2	93.2	93.6
D31 → A31	69.0	71.0	72.9	72.2	73.0
W31 → A31	67.3	73.1	73.4	72.7	73.9
C12 → P12	74.7	76.8	78.5	77.8	78.7

The best results are marked in bold

To verify the advantage of our proposed intra-domain graph regularization over the inter-domain graph regularization [15, 29], we use the inter-domain graph regularization instead of intra-domain graph regularization to obtain another variant of DMSP, denoted as DMSP-L₄, then observe the change in performance. The classification accuracy results of DMSP and DMSP-L₄ on several tasks are demonstrated in Fig. 4, where C10 → W10, A10 → W10, W10 → C10 are from Office + Caltech (SURF) dataset, C10 → D10, A10 → D10, D10 → A10 are from Office + Caltech (Decaf6) dataset. We find that the classification accuracy of DMSP on each task is higher than that of DMSP-L₄, whether based on shallow feature (i.e., SURF) or deep feature (i.e., Decaf6 and ResNet50). This is because intra-domain graph regularization aims to preserve the respective local structure of each domain while intra-domain graph regularization ignoring difference between domains aims to preserve the local structure of the whole data in different domains. Therefore, the intra-domain graph regularization is more conducive to enhancing the performance of final classification than the inter-domain graph regularization.

To further verify the performance of DMSP, we calculate and observe the average number of samples from a different class in each set of 10-nearest neighbors (ANDC) based on the label distribution in the target domain. The label distribution can be obtained by the final adaptive classifier learned by DMSP. If the final classifier is able to discriminate the target samples, the adjacent samples are predicted as the same label and the value of ANDC will be small. Figure 5 shows the ANDC values produced by DMSP and the related works (i.e., ARTL, MDDA) on several tasks from Office + Caltech (SURF). As can be observed, the ANDC values differ greatly. Especially, on tasks C10 → W10 and A10 → W10, the ANDC value of ARTL reaches almost 7. That is, on average, up to 7 of each 10-nearest neighbors are predicted to be different categories. The main reason is that ARTL learns the adaptive classifier in original space where feature distortion will negatively affect the performance of the classifier, and ignores the discriminative information within domains. MDDA learns the adaptive classifier on the Grassmann manifold (GM), but it also ignores the impact of the intra-domain local structure and the inter-class distance on the discriminant structure of the adaptive classifier. Therefore, the ANDC value of MDDA is smaller than that of ARTL, but larger than that of our DMSP, which jointly minimizes the structural risk function, the distribution adaptation regularization with inter-class distance maximization added and the intra-domain graph regularization, and learns the adaptive classifier on the GM.

Parameter analysis

Experiments are conducted on randomly selected tasks A10 → C10 with Decaf6 features, A10 → W10 with SURF features, C12 → p12 and W31 → A31 to analyze the parameter sensitivity and convergence of DMSP. Since we have verified that our proposed DMSP performs well, when $\sigma = 0.1$, $\lambda = 10$, $p = 10$ are fixed to be the same as ARTL on all cross-domain classification tasks, we only evaluate the sensitivity of intra-domain graph regularization parameter $\gamma$ and trade-off parameter $\beta$. The classification accuracy curves of the selected tasks are provided in Fig. 6.

$\gamma$ is the intra-domain graph regularization parameter. If $\gamma$ is too small, the intra-domain local structure cannot be preserved; if it is too large, the discriminant information of domains and the distribution adaptation will be ignored. Figure 6a shows that DMSP achieves better DA performance in the range $\gamma \in \left[ {0.1, \, 1} \right]$. Besides, $\beta$ is the trade-off parameter between inter-domain and inter-class distribution difference. Figure 6b reveals that DMSP has better DA performance in the range $\beta \in \left[ {0.1, \, 0.4} \right]$.

Finally, we check the convergence property of DMSP through empirical analysis. The convergence curves of the classification accuracy of DMSP are shown in Fig. 7. It can be observed that the classification accuracy of DMSP in each task increases steadily with more iterations and tends to stabilize within 10 iterations, indicating that DMSP converges within a couple of iterations.

Conclusion

In this paper, a novel method referred to as distribution matching and structure preservation for domain adaptation (DMSP) is proposed. DMSP aims to exploit an adaptive classifier on the GM with the principle of structural risk minimization while preserving the intra-domain local structure and matching the distributions of different domains. First, the source and target samples are embedded into the manifold feature space using the GFK method, and their feature subspaces are geometrically aligned. Then, based on the principle of structural risk minimization, an adaptive classifier is learned. During this process, distribution matching is carried out by adding regularization term based on inter-domain and inter-class distribution differences; the respective local structures of the source and target domains are preserved by adding intra-domain graph regularization term. Comprehensive experiments on several cross-domain classification datasets validate the effectiveness of DMSP and its superiority over other state-of-the-art DA methods. Actually, DMSP performs adaptive classifier learning on the GM in two separate steps, which is not simple enough. In the future, more simplified and efficient design of DA models is worthy of further research. In addition, we would like to extend our work to more complex situations, such as those where both the distribution and the feature space are different between the source and target domains.

Declarations

Conflict of interest

All authors declare no conflict of interest.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel A collaborative neurodynamic optimization algorithm to traveling salesman problem

Nächster Artikel T-spherical uncertain linguistic MARCOS method based on generalized distance and Heronian mean for multi-attribute group decision-making with unknown weight information

Xue Q, Zhang W, Zha H (2020) Improving domain-adapted sentiment classification by deep adversarial mutual learning. In: Proceedings of the AAAI Conference on artificial intelligence, pp 9362–9369

Wang D, Lu C, Wu J et al (2020) Softly associative transfer learning for cross-domain classification. IEEE Trans Cybern 50(11):4709–4721CrossRef

Li X, Zhang W, Xu NX et al (2020) Deep learning-based machinery fault diagnostics with domain adaptation across sensors at different places. IEEE Trans Ind Electron 67(8):6785–6794CrossRef

Xu Y, Lang H (2021) Ship classification in sar images with geometric transfer metric learning. IEEE Trans Geosci Remote Sens 59(8):6799–6813CrossRef

Koehler S, Hussain T, Blair Z et al (2021) Unsupervised domain adaptation from axial to short-axis multi-slice cardiac MR images by incorporating pretrained task networks. IEEE Trans Med Imaging 40(10):2939–2953CrossRef

Miao YQ, Farahat AK, Kamel MS (2015) Ensemble kernel mean matching. In: Proceedings of the IEEE International Conference on Data Mining. IEEE, pp 330–338

Chandra S, Haque A, Khan L, et al (2016) Efficient sampling-based kernel mean matching. In: Proceedings of the IEEE International Conference on Data Mining. IEEE, pp 811–816

Xia R, Pan Z, Xu F (2018) Instance weighting for domain adaptation via trading off sample selection bias and variance. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm Sweden, pp 13–19

Wang J, Chen Y, Hao S, et al (2017) Balanced distribution adaptation for transfer learning. In: 2017 IEEE International Conference on Data Mining. IEEE, pp 1129–1134

10.

Pan SJ, Tsang IW, Kwok JT et al (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Networks 22(2):199–210CrossRef

11.

Courty N, Flamary R, Tuia D et al (2017) Optimal transport for domain adaptation. IEEE Trans Pattern Anal Mach Intell 39(9):1853–1865CrossRef

12.

Fernando B, Habrard A, Sebban M, et al (2013) Unsupervised visual domain adaptation using subspace alignment. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2960–2967

13.

Bruzzone L, Marconcini M (2009) Domain adaptation problems: A DASVM classification technique and a circular validation strategy. IEEE Trans Pattern Anal Mach Intell 32(5):770–787CrossRef

14.

Ghifary M, Kleijn WB, Zhang M (2014) Domain adaptive neural networks for object recognition. In: Pacific Rim International Conference on artificial intelligence. Springer Cham, pp 898–904

15.

Long M, Wang J, Ding G et al (2014) Adaptation regularization: a general framework for transfer learning. IEEE Trans Knowl Data Eng 26(5):1076–1089CrossRef

16.

Zhao J, Li L, Deng F et al (2022) Discriminant geometrical and statistical alignment with density peaks for domain adaptation. IEEE Trans Cybern 52(2):1193–1206CrossRef

17.

Ben-David S, Blitzer J, Crammer K, et al (2007) Analysis of representations for domain adaptation. In: Advances in neural information processing systems, pp 137–144.

18.

Cao Y, Long M, Wang J (2018) Unsupervised domain adaptation with distribution matching machines. In: Proceedings of the AAAI Conference on artificial intelligence, pp 2795–2802.

19.

Wang J, Feng W, Chen Y, et al (2018) Visual domain adaptation with manifold embedded distribution alignment. In: Proceedings of the 26th ACM International Conference on multimedia, pp 402–410

20.

Gretton A, Borgwardt KM, Rasch MJ et al (2012) A kernel two-sample test. J Mach Learn Res 13:723–773MathSciNetMATH

21.

Long M, Wang J, Ding G, et al (2013) Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2200–2207

22.

Li S, Song S, Huang G et al (2018) Domain invariant and class discriminative feature learning for visual domain adaptation. IEEE Trans Image Process 27(9):4260–4273MathSciNetCrossRef

23.

Huang J, Zhou Z (2019) Transfer metric learning for unsupervised domain adaptation. IET Image Proc 13(5):804–810CrossRef

24.

Xiao T, Liu P, Zhao W et al (2019) Structure preservation and distribution alignment in discriminative transfer subspace learning. Neurocomputing 337:218–234CrossRef

25.

Li J, Lu K, Huang Z et al (2018) Transfer independently together: a generalized framework for domain adaptation. IEEE Trans Cybern 49(6):2144–2155CrossRef

26.

Luo L, Chen L, Hu S et al (2020) Discriminative and geometry-aware unsupervised domain adaptation. IEEE Trans Cybern 50(9):3914–3927CrossRef

27.

Tian L, Tang Y, Hu L et al (2020) Domain adaptation by class centroid matching and local manifold self-learning. IEEE Trans Image Process 29:9703–9718MathSciNetCrossRefMATH

28.

Wang Y, Nie L, Li Y et al (2020) Soft large margin clustering for unsupervised domain adaptation. Knowl-Based Syst 192:105344CrossRef

29.

Wang J, Chen Y, Feng W et al (2020) Transfer learning with dynamic distribution adaptation. ACM Trans Intell Syst Technol (TIST) 11(1):1–25

30.

Gong B, Shi Y, Sha F, et al (2012) Geodesic flow kernel for unsupervised domain adaptation. In: 2012 IEEE Conference on computer vision and pattern recognition. IEEE, pp 2066–2073

31.

Schölkopf B, Herbrich R, Smola A J (2001) A generalized representer theorem. In: International Conference on computational learning theory. Springer, Berlin, Heidelberg, pp 416–426

32.

Saenko K, Kulis B, Fritz M, et al (2010) Adapting visual category models to new domains. In: European Conference on computer vision. Springer, Berlin, Heidelberg, pp 213–226

33.

Donahue J, Jia Y, Vinyals O, et al (2014) Decaf: A deep convolutional activation feature for generic visual recognition. In: International Conference on Machine Learning. PMLR, pp 647–655

34.

He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778.

35.

Zhang J, Li W, Ogunbona P (2017) Joint geometrical and statistical alignment for visual domain adaptation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5150–5158

36.

Long M, Cao Y, Wang J, et al (2015) Learning transferable features with deep adaptation networks. In: International Conference on machine learning. PMLR, pp 97–105

37.

Ganin Y, Ustinova E, Ajakan H et al (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17:59MathSciNetMATH

38.

Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: Proceedings of the Conference on machine learning, pp 2208–2217

39.

Zhang W, Ouyang W, Li W, et al (2018) Collaborative and adversarial network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3801–3809

40.

Long M, Cao Z, Wang J, et al (2018) Conditional adversarial domain adaptation. In: Advances in neural information processing systems, pp 1647–1657

41.

Fang X, Bai H, Guo Z et al (2020) DART: Domain-adversarial residual-transfer networks for unsupervised cross-domain image classification. Neural Netw 127:182–192CrossRef

42.

Zhu Y, Zhuang F, Wang J et al (2019) Multi-representation adaptation network for cross-domain image classification. Neural Netw 119:214–221CrossRef

43.

Luo Y, Ren C, Dai D et al (2022) Unsupervised domain adaptation via discriminative manifold propagation. IEEE Trans Pattern Anal Mach Intell 44(3):1653–1669CrossRef

Titel: Distribution matching and structure preservation for domain adaptation
verfasst von: Ping Li
Zhiwei Ni
Xuhui Zhu
Juan Song
Publikationsdatum: 13.10.2022
Verlag: Springer International Publishing
Erschienen in: Complex & Intelligent Systems / Ausgabe 2/2023
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI: https://doi.org/10.1007/s40747-022-00887-3

Springer Professional

Distribution matching and structure preservation for domain adaptation

Abstract

Publisher's Note

Introduction

Distribution matching

Local structure preservation

Adaptive classifier learning

Proposed method

Problem statement

Problem formulation

Problem solving and algorithm description

Experimental analysis

Dataset descriptions

Experimental setup

Experimental results

Effectiveness verification

Parameter analysis

Conclusion

Declarations

Conflict of interest

Publisher's Note

Premium Partner

Springer Professional

Abstract

Publisher's Note

Introduction

Related work

Distribution matching

Local structure preservation

Adaptive classifier learning

Proposed method

Problem statement

Problem formulation

Problem solving and algorithm description

Experimental analysis

Dataset descriptions

Experimental setup

Experimental results

Effectiveness verification

Parameter analysis

Conclusion

Declarations

Conflict of interest

Publisher's Note

Weitere Artikel der Ausgabe 2/2023

Hybridizing slime mould algorithm with simulated annealing algorithm: a hybridized statistical approach for numerical and engineering design problems

AGDS: adaptive goal-directed strategy for swarm drones flying through unknown environments

A hybrid spectral clustering simulated annealing algorithm for the street patrol districting problem

A swarm-optimizer-assisted simulation and prediction model for emerging infectious diseases based on SEIR

Federated learning based on stratified sampling and regularization

Multilayer Fisher extreme learning machine for classification

Premium Partner