Skip to main content
Top
Published in: Journal of Big Data 1/2021

Open Access 01-12-2021 | Research

Dissimilarity space reinforced with manifold learning and latent space modeling for improved pattern classification

Authors: Azadeh Rezazadeh Hamedani, Mohammad Hossein Moattar, Yahya Forghani

Published in: Journal of Big Data | Issue 1/2021

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Dissimilarity representation plays a very important role in pattern recognition due to its ability to capture structural and relational information between samples. Dissimilarity space embedding is an approach in which each sample is represented as a vector based on its dissimilarity to some other samples called prototypes. However, lack of neighborhood-preserving, fixed and usually considerable prototype set for all training samples cause low classification accuracy and high computational complexity. To address these challenges, our proposed method creates dissimilarity space considering the neighbors of each data point on the manifold. For this purpose, Locally Linear Embedding (LLE) is used as an unsupervised manifold learning algorithm. The only goal of this step is to learn the global structure and the neighborhood of data on the manifold and mapping or dimension reduction is not performed. In order to create the dissimilarity space, each sample is compared only with its prototype set including its k-nearest neighbors on the manifold using the geodesic distance metric. Geodesic distance metric is used for the structure preserving and is computed using the weighted LLE neighborhood graph. Finally, Latent Space Model (LSM), is applied to reduce the dimensions of the Euclidean latent space so that the second challenge is resolved. To evaluate the resulted representation ad so called dissimilarity space, two common classifiers namely K Nearest Neighbor (KNN) and Support Vector Machine (SVM) are applied. Experiments on different datasets which included both Euclidean and non-Euclidean spaces, demonstrate that using the proposed approach, classifiers outperform the other basic dissimilarity spaces in both accuracy and runtime.
Notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abbreviations
LLE
Locally Linear Embedding
LSM
Latent Space Model
KNN
K Nearest Neighbor
SVM
Support Vector Machine
PCA
Principal Component Analysis
GA
Genetic Algorithm
MAP
Maximum a Posteriori
UCI
University of California Irvine
USPS
United States Postal Services
MNIST
Mixed National Institute of Standards and Technology
SSDS
Structured Sparse Dictionary Selection

Introduction

In dissimilarity space embedding, each pattern is mapped into a vector space based on its differences to other patterns. The main characteristic of this method is that any dissimilarity measure, whether metric or non-metric, Euclidean or non-Euclidean, symmetric or asymmetric, can be used [1]. For two reasons, research in this area is important. Firstly, this method completes feature-based representation [2] and secondly, this method is a bridge between statistical and structural approaches in pattern recognition [3].
One of the most challenging points of this embedding is the prototype selection approach, which includes the selection strategy and the number of prototypes. It will have a very significant impact on reducing the computational complexity of measuring dissimilarity, the dimensions of the final embedding space and subsequently improving classifier’s performance. Another significant point is the lack of attention to the data structure and preserving neighbors of samples while constructing and embedding to the dissimilarity space.
Our contribution in this paper is to provide a method for selecting prototypes considering the manifolds and data structure, then reducing dimensions and complexity of the proposed dissimilarity space. In the proposed method, prototype set P is not fixed for all samples and each sample has its own specific prototypes which are its neighbors on the manifold. So, one sample is not forced to measure its differences with all other members of prototype set. Also dimensions of the proposed dissimilarity space is reduced by Latent Space Model. In this way, set P is pruned and dimension of dissimilarity space is reduced so it is likely that the computational complexity will decrease while the performance of the classifiers in terms of accuracy and runtime can be improved.
The rest of the paper is organized as follows. “Related works” section describes the related works in this area. Then, in “Background knowledge” section, explanations are given about the LLE method, geodesic distance and the latent space model as the tools used in the proposed method. Then the proposed approach is fully explained step-by-step in “The proposed method” section. In “Experimental results and discussion” section, the evaluation datasets and the evaluation scenarios are discussed and the results are analyzed. Finally, “Conclusion” section summarizes the whole paper and proposes some guidance for the future works.
In 1985, Goldfrab proposed to compute the differences between structural data to represent them rather than a feature-based representation for the first time. Since it was not a useful approach, it could not be conspicuous enough for researchers. Pseudo-Euclidean space was also proposed by Goldfrab in the same year [3, 4]. After 1995, Duin and Pȩkalska [1] began their research in this area, and eventually in 2005 introduced a distance-based method called the dissimilarity representation as a substitute for the feature-based representation. Dissimilarity space was first presented in 2008 by Pȩkalska [5] as a way to map structural data to vector space, so that traditional learning algorithms can be applied. Also, in 2012, the dissimilarity space was described as a bridge between structural and statistical approaches [3].
In recent years, dissimilarity space has been applied in some major areas such as classification of images and spectrograms [6, 7], handwritten digits [2, 3, 813], radiological images processing to detect autoimmune diseases [14], classification of biological arrays such as microarray data [15], 3D streamlines of human brain [16], spectrum representation [17], time series classification [13, 18], time series classification [47] and graph representation and classification [1921].
Nowadays deep learning models are pioneer in surveillance applications, but to achieve higher performance in reidentification persons, dissimilarity space is a better approach [22, 23]. In [22] dissimilarity representation is used to identify differences occurred in source and target captured videos. Also a dissimilarity-based loss function which is optimized by gradient descent is suggested to apply in identification system systems.
Prototype selection has an important role in reducing dimensions of dissimilarity space. In the prototype selection approaches, the most diverse samples have to be selected as prototypes and among similar samples only one will be selected [24]. In 2006, Pȩkalska et al. [25] proposed some methods for pruning the prototype set (P) such as Random, RandomC, Kcenters, and EditCon. Also Calaña et al. [26] used a genetic algorithm as a method for prototype selection. In 2015, a method for generating and selecting prototypes for structural data is presented [12]. Principal Component Analysis (PCA) is used as a method to reduce the dimension of the dissimilarity space [18] and Genetic Algorithm (GA) applied in the approximate search method in dissimilarity space [27].
In 2021, Silva et al. proposed a prototype generation method and estimates optimal number of prototypes [28]. This algorithm in first step with a Self-Organizing Maps neural network generates a reduced training samples then in next step uses a classifier based on the nearest informative neighbors.
Recent approaches of prototype selection applied to multi-label data classification [29], click fraud detection [30] and generating business process model [31] are proposed in the last two years. In [30], first of all divides imbalanced dataset into four groups and applies under-sampling technique to balance dataset. At the second step, only relevant samples are chosen as prototypes in order to reduce size of training set.
In [48] a novel prototype learning approach is proposed which is a modified version of Zero-shot learning which learns prototypes from unseen classes via training a classification model. This approach is a simple and efficient approach to learn prototypes from the class-level semantic information.
In all previously studied works, the prototype set P is constant for all samples. That is, any sample must necessarily be compared to all the members of prototype set P, and then the dissimilarity space is created. Therefore, in this paper the proposed set P is defined dynamically and unevenly for all samples. In other words, a sample is not required to be compared with all the prototypes in set P to create the dissimilarity space, but also considering the data structure and neighborhood of each sample on the manifold, it is just compared with a limited set of members. The main goals of this new framework are to reduce the complexity and dimensions of space and also to achieve higher classification performance and lower runtime.
Since similar data which belong to the same class have little differences, so they are closer to each other in the dissimilarity space, and vice versa. As a result, it's likely that different class samples will be more distinct in this space, and so construction of classifiers will be more justified [25]. The most commonly used classifiers in the dissimilarity space are KNN and Linear SVM [3, 7, 14, 18]. In spite of the existence of higher performance classifiers, KNN classifier is one of the most widely used classifiers in this space, because of its simplicity and good behavior at metric-based spaces. Therefore, in this paper the KNN and SVM classifiers are chosen to evaluate our proposed method.

Background knowledge

Locally linear embedding (LLE)

This learning algorithm recognizes the fundamental structure of the manifold and reduces dimensions. LLE is nonlinear, based on Eigen vectors and preserves the local neighborhood of samples [32]. It has three main steps:
  • Finding \(l\) neighbors for each \({x}_{i}\) sample with D dimensions.
  • Reconstructing each sample based on the weighted linear combination of its \(l\) nearest neighbors and obtaining the optimal weights \({W}_{ij}\) by minimizing the following equation. In other words, a weighted neighborhood graph is constructed in this step and \({W}_{ij}\)=0 if \({x}_{j}\) is not neighbor of \({x}_{i}\) [33]:
    \(\varepsilon \left(W\right)={\sum }_{i}{\left|{\overrightarrow{x}}_{i}-\sum_{j}^{l}{W}_{ij}{\overrightarrow{x}}_{j}\right|}^{2}\) \(\left\{{x}_{j}\right\}:l\) nearest neighbors of \({x}_{i}\)
    $${\sum }_{j}{W}_{ij}=1$$
    (1)
  • With the optimal weights \({W}_{ij}\) and minimizing the following equation, new embedded \({y}_{i}\) vectors with d dimensions (d < D) can be obtained [32].
    $$\Phi \left(\mathrm{y}\right)={\sum }_{i}{\left|{\overrightarrow{y}}_{i}-\sum_{j}^{l}{W}_{ij}{\overrightarrow{y}}_{j}\right|}^{2} \left\{{y}_{j}\right\}:l\,\mathrm{ nearest \, neighbors\, of \, }{y}_{i}$$
    (2)
Some of its advantages are simplicity of implementation, escaping from local minima, mapping high dimensional data into a single coordinate system of lower dimensions and faster rate in calculating the Eigenvalues of the matrix \(M={I-W}^{T}\left(I-W\right)\) due to the fact that many weights are zero [32, 34].

Geodesic distance

In graph theory, the sequence of edges between two vertices is called path. The length of the path between two vertices in an unweighted graph is the number of edges and in a weighted graph is the sum of the weights of the edges between them. The distance between two vertices, which is equal to the length of the shortest path between them, is called geodesic distance [35].
In this step, an undirected weighted graph is created according to the LLE manifold. Each point on the LLE manifold is assumed as a vertex (\({y}_{i}\)) and Euclidean distance of two adjacent vertices is the weight of their connected edge (Fig. 1).
In order to compute geodesic distance between two vertices \({y}_{i} {\text{and}} {y}_{j}\), if they are neighbors, the following equation is used:
$$gDist\left({y}_{i}.{y}_{j}\right)\approx d\left({y}_{i}.{y}_{j}\right)=\sqrt{{\left({y}_{i1}-{y}_{j1}\right)}^{2}+{\left({y}_{i2}-{y}_{j2}\right)}^{2}+\dots +{\left({y}_{id}-{y}_{jd}\right)}^{2}}$$
(3)
But if they are not neighbors, the length of shortest distance is approximately considered as geodesic distance [33].
$$gDist\left({y}_{i}.{y}_{k}\right)=d\left({y}_{i}.{y}_{j}\right)+d({y}_{j}.{y}_{k})$$
(4)

Latent Space Model (LSM)

Latent Space Model (LSM) was proposed in 2002 by Hoff et al. [36] to estimate similarity between nodes and their connectivity in the social networks. In this model, for each node \({x}_{i}\) in the \({X}_{N*N}\) input network, there is an embedded position \({z}_{i}\) in the unseen latent space\({Z}_{N*Q}\). This latent space has three characteristics: (1) Euclidean property: many traditional classification algorithms can be performed in this space. (2) Its dimensions are lower than initial input network’ dimensions \((Q\ll N)\) so this model can be used as dimension reduction method. (3) The probability of the communication edge,\({Y}_{ij}\), between the two nodes \({x}_{i}\) and \({x}_{j}\) in the network, depends on the Euclidean distance between their positions in the latent space, \({z}_{i}\) and \({z}_{j}\) [36]. The process of the model is as follows:
  • Initialization The latent space \({Z}_{N*Q}\) is initialized randomly with a normal distribution with zero mean and unit variance.
    $${z}_{ij}=N\left(0.I\right) i.j=1\dots N$$
    (5)
  • Inference In this step, Maximum a Posteriori (MAP) method is used to estimate hidden space Z under the model. Equation 6 estimates the latent variable \(Z\) by the MAP method where X presents input network and Z presents the latent space.
    $${Z}_{MAP}={\mathit{arg}}_{z}\mathrm{maxlog}p\left(X|Z\right)$$
    (6)
    $${=\mathit{arg}}_{z}\mathrm{max}\sum_{i=1}^{N}\mathrm{log}p\left({x}_{i}|Z\right)$$
    $${=arg}_{z}\mathrm{ min}\left(-\frac{1}{N}\sum_{i=1}^{N}\mathrm{log}p\left({x}_{i}|Z\right)\right)$$
    In order to estimate the hidden variable Z by MAP, the gradient descent optimization method is used in a definite number of iterations until it converges to local optima [37].
    $${\nabla }_{z }\mathrm{log}p(X.Z)$$
    (7)
  • Estimate connected edges This step is optional unless you are looking to analyze latent space and to estimate connectivity patterns. As the probability of being the edge \({Y}_{ij}\) between the two nodes \({x}_{i}\) and \({x}_{j}\) in the network depends on the Euclidean distance between their positions in the latent space, \({z}_{i}\) and \({z}_{j}\) so \(\left|{z}_{i}-{z}_{j}\right|\) is calculated. Then the probability of the edge, \({Y}_{ij}\) is computed in terms of the Poisson probability distribution function.
    $${Y}_{ij}=Poisson(\frac{1}{\left|{z}_{i}-{z}_{j}\right|})$$
    (8)

The proposed method

The framework of the proposed method as illustrated in Fig. 2 consists of five steps. In the following, each step of the approach is described as an algorithm.
In Algorithm 1, we use the Locally Linear Embedding (LLE) algorithm, which is one of the neighborhood-preserving manifold learning algorithms to learn the data manifold. The input matrix \({X}_{N*D}\) contains training samples. \(N\) is the number of samples in training set, and \(D\) presents the number of the dimensions of each sample. The output matrix \({Y}_{N*d}\) where \(N\) is the number of samples and \(d\) shows the dimension of the embedded samples where d is lower than D (\(d\ll D)\). The LLE algorithm consists of three steps, which is described more in details in “Locally linear embedding (LLE)” section.
In Algorithm 2, first an undirected weighted graph is constructed according to the data manifold obtained by the LLE algorithm. Each new embedded point on the LLE manifold is assumed as a vertex (\({y}_{i}\)) that is connected to the \(l-\) nearest neighbors (\({e}_{ij}\)) and Euclidean distance between them is considered as the weight of their connected edge (\({w}_{ij}\)). Now geodesic distance matrix is computed.
Algorithm 3 consists of two main steps: (1) define prototype set (2) construct dissimilarity space.
First, the geodesic distance matrix is sorted ascending and the k-nearest neighbors of each sample on the manifold are selected to define the prototype set \({P}_{i}\) of each sample \({x}_{i}\) (Eq. 9). It should also be noted that the farthest neighbor \({(\alpha }_{i})\) of each sample is also determined.
$${P}_{i}=\left\{{p}_{i1}.\dots .{p}_{ik}\right\}$$
(9)
In step 2, dissimilarity space is created as a matrix \({DS}_{N*N}\) whose diagonal elements are zero.
As the default value for entries, the distance between each sample and its farthest neighbor \({(\alpha }_{i})\) on the manifold is placed in the matrix \(DS\) (Eq. 11). Then, the Euclidean distance of each sample \({x}_{i}\) with its k-nearest neighbors on the manifold \(\left\{{p}_{i1}.\dots .{p}_{ik}\right\}\) is computed and is placed in \(DS\) (Eq. 10). It should be noted that we prefer to use the initial samples (\({X}_{N*D})\) before any dimension reduction, so the dissimilarity space is not constructed using manifold embedding \({Y}_{N*d}\).
$${d({x}_{i}.x}_{j})={d}_{ij}=\sqrt{{\left({x}_{i1}-{x}_{j1}\right)}^{2}+\dots +{\left({x}_{iD}-{x}_{jD}\right)}^{2}}$$
(10)
$${DS}_{ij}= \left\{\begin{array}{l} 0 \,if\, i=j\\ {d({x}_{i}.x}_{j}) if j \epsilon { P}_{i}\\ {d({x}_{i}.\alpha }_{i}) otherwise\end{array}\right\}$$
(11)
In algorithm 4, Latent Space Model (LSM) is used to simplify and to reduce the dimensions of dissimilarity space. First, the latent space with N*Q dimensions that N is the number of training samples and Q is lower than N, is initialized randomly with zero mean and unit variance. Then in a loop with the user-defined number of iterations, the Maximum a Posteriori (MAP) method estimates the optimal positions in the latent space \({Z}_{N*Q}\). To implement this step, Edward library [37], which is based on Python programming language, is used. Final augmented dissimilarity space is obtained by inner product of \({DS}_{N*N}\) and \({Z}_{N*Q}\) as in the following equation (Eq. 12).
$$D{S}_{Augmented}={DS}_{N*N}\bullet {Z}_{N*Q}$$
(12)
Finally, classifiers such as KNN and SVM are trained by the N-by-Q matrix of \(D{S}_{Augmented}\) and classify new queries which are mapped into the augmented space. In order to evaluate the results, accuracy and runtime are considered as classification performance measurements.

Experimental results and discussion

Data sets

Since the proposed method is not limited to a particular problem, nine Euclidean and non-Euclidean data sets that are resulting in 10 dissimilarity spaces are used to evaluate the approach (more than one measure and different number of classes and samples are considered in these dissimilarity spaces). Datasets of classification problems in University of California Irvine (UCI) machine learning repository [38] such as iris, glass, wine, breast cancer, monk, ionosphere radar signals and 8 × 8 images of optical recognition of handwritten digits are used in this paper. USPS (United States Postal Services) [39] and MNIST (Mixed National Institute of Standards and Technology) [40] are two other standard data sets for the handwritten digit recognition problem and they are chosen to evaluate our approach. The images in USPS are 16 × 16 inches and they are shown as records with 256 features. Also, Euclidean and two-way tangent [41, 42] measures are chosen to construct the dissimilarity spaces of USPS data set. MNIST includes 70,000 handwritten digits which are centered in images with 28 × 28 pixels. The general information about these datasets is indicated in Table 1.
Table 1
Descriptions of the datasets
Dataset
Measure Type
# Records
# Dimensions
# Classes
Iris
Euclidean
150
4
3
Glass
Euclidean
214
9
7
Ionosphere
Euclidean
351
34
2
Monk
Euclidean
122
6
2
Breast Cancer
Euclidean
569
30
2
Wine
Euclidean
178
13
3
USPS_nE
Two-way tangent
250
256
10
USPS_E
Euclidean
500
256
10
Digit5
Euclidean
200
64
5
MNIST
Euclidean
70,000
784
10

Performance measurement

In this study, accuracy measure, that is the most instinctive performance measure, is used to evaluate the classification performance of the proposed approach and also is reported with the measure of mean and standard deviation in K-fold cross validation as an evaluation method. This measure in Eq. 13 is a ratio of correctly classified samples (t) includes true positives plus true negatives to the total samples (n) [43].
$$Accuracy=\frac{t}{N}*100$$
(13)
F1-score is another metric which used in this paper as a measure of test’s accuracy and calculated based on precision and recall and is shown in Eq. 14. Tp,fp and fn are abbreviations of true positive, false positive and false negative identified results [44].
$${F}_{1}=2*\frac{precision*recall}{precision+recall}=2*\frac{tp}{tp+\frac{1}{2}(fp+fn)}$$
(14)
In addition, the efficiency of the proposed method is also evaluated in terms of runtime. The runtimes have been obtained on a laptop with an Intel® Core™ i5-3230 M CPU @ 2.8 GHz with 8 GB RAM.

Evaluation scenarios

Four scenarios are considered for the evaluation of the proposed approach:
1.
Evaluating the whole proposed framework New dissimilarity space according to the proposed algorithm using predefined prototype number list for each dataset (Table 2) is constructed and it will be embedded into the lower dimension latent space. Then classifiers such as KNN with different K values (i.e., 1, 3, 5, 7, 11), linear SVM and polynomial SVM (degree = 2, 3) are used to evaluate the method in terms of accuracy and runtime.
 
2.
Comparing with basic dissimilarity spaces The proposed method is compared in both accuracy and runtime with basic dissimilarity spaces which are defined as follows:
 
DS_ALL: All samples in the training set are selected as the representatives. In other words, the representative set R is equal to the training set [3].
 
DS_Random: K samples are randomly selected as the representatives [25].
 
DS_RandomC: From each class C, k random samples are selected as the representatives (K = k × C) [25].
 
3.
Evaluating the LSM phase In this case the classifiers are trained on the dissimilarity space \({DS}_{N*N}\) without any dimension reduction and mapping into the latent space. Doing so, the impact of the LSM and dimension reduction on the performance of the classifiers is determined.
 
4.
Comparing with the recent similar works The highest classification performance of the proposed method is compared with three similar recent publications in term of accuracy on the same datasets. More details about these papers are described in “Comparing with the recent approaches” section.
 
Table 2
Parameters of the proposed method
Dataset
LLE (k_LLE)
LSM (lsm_dimension)
LSM (lsm_iteration)
K-fold cross validation
Prototype numbers (k_neighbor_manifold)
Iris
55
75
100
10
1,20,75,100,149
Glass
10
107
100
10
1,10,60,140,213
Ionosphere
35
200
100
10
1,20,50,100,350
Monk
10
120
100
10
1,20,60,90,121
Breast Cancer
35
500
100
10
1,10,20,50,100
Wine
15
140
100
10
1,10,20,50,80
USPS_nE
30
125
100
10
1,20,40,60,249
USPS_E
10
420
100
10
1,20,100,120,150
Digit5
20
160
100
10
1,10,20,100,150
MNIST
100
2000
100
10
1,1000,5000,10,000

Results

Table 2 shows the values of the required parameters for each dataset including the number of neighbors needed to learn the LLE Manifold (k_LLE), the latent space dimensions (lsm_dimension), the number of iterations of estimation in LSM (lsm_iteration), the number of folds in cross validation (K-fold) evaluation scheme and the number of neighbors of each sample on the manifold that should be considered as prototypes in proposed method (k_neighbor_manifold). It should be noted that in this study the optimal k-nearest neighbors of LLE (k_LLE) is defined with an approach introduced in paper [45] and other parameters are tuned by a trial and error method.

Evaluating the whole proposed framework

As described in previous sections, the Locally Linear Embedding (LLE) algorithm, is learned the data manifold. Figure 3 shows the LLE embedded neighborhood graph with optimal number of neighbors.
The LLE embedded neighborhood graph of MNIST data due to its huge samples is ambiguous so its manifold based on first two features is shown in Fig. 4.
To compute geodesic distance, the undirected weighted graph is constructed based on the obtained LLE manifold. In this graph, as suggested in paper [45] each point is connected to the small fixed number of neighbors (\(l=10 )\).
Tables 3 and 4 report the accuracy and runtime of the proposed method on Iris dataset. The numerical value in the columns shows the number of the nearest neighbors of each sample on the manifold which are determined as the prototypes for constructing dissimilarity space (k_neighber_manifold). The results of the Linear SVM shows that the maximum accuracy achieved by our proposed method with comparing each point to only its 20-nearest neighbors on the manifold, is 98% at 0.01s. This is while the accuracy of other classifiers in dissimilarity spaces considering with different number of prototypes are lower than 98.
Table 3
Evaluation of proposed method on Iris dataset in terms of accuracy (%) with tenfold cross validation (best results denote in italics typeface)
Classifier
#Prototype
1
20
75
100
149
1-nn
96.00(± 1.2)
95.33(± 0.8)
96.00(± 0.2)
96.00(± 0.8)
96.00(± 0.7)
3-nn
94.00(± 2.1)
95.33(± 1.1)
94.00(± 1.2)
96.67(± 1.6)
96.00(± 1.4)
5-nn
95.33(± 2.3)
95.33(± 1.3)
93.33(± 2.4)
96.67(± 1.4)
95.33(± 1.2)
7-nn
91.33(± 3.3)
96.00(± 1.1)
92.67(± 3.2)
96.00(± 1.9)
97.33(± 0.5)
11-nn
94.00(± 2.5)
96.00(± 0.9)
94.67(± 2.1)
96.67(± 1.3)
96.67(± 1.2)
linSVM
94.67(± 1.1)
98.00(± 0.2)
95.33(± 2.3)
95.33(± 2.3)
95.33(± 2.3)
polySVM(2)
94.67(± 1.3)
97.33(± 0.8)
95.33(± 1.2)
94.67(± 3.1)
95.33(± 1.2)
polySVM(3)
94.67(± 2.3)
97.33(± 1.5)
95.33(± 1.4)
94.00(± 3.2)
94.67(± 1.3)
Average
94.33(± 1.38)
96.33(± 1.07)
94.58(± 1.15)
95.75(± 1.0)
95.83(± 0.85)
Table 4
Evaluation of proposed method on Iris dataset in terms of runtime (s)
Classifier
#prototype
1
20
75
100
149
1-nn
0.10
0.09
0.09
0.09
0.11
3-nn
0.10
0.09
0.11
0.09
0.11
5-nn
0.11
0.09
0.10
0.09
0.13
7-nn
0.10
0.09
0.09
0.09
0.09
11-nn
0.12
0.09
0.09
0.08
0.09
linSVM
0.01
0.01
0.00
0.01
0.01
polySVM(2)
0.01
0.00
0.00
0.01
0.01
polySVM(3)
0.01
0.00
0.00
0.01
0.01
Average
0.10
0.09
0.09
0.09
0.11
Figure 5 demonstrates the classification error curve of the proposed method versus the number of prototypes of each sample on the manifold, on 10 evaluation data sets and optimal prototype numbers for each dataset are shown in this figure.

Comparing with the basic dissimilarity spaces

In Table 5 the best and the average accuracy of the proposed method on each dataset and in Table 6 the best and the average F1-score are compared with other basic dissimilarity spaces and the initial dataset. The numerical values in the parenthesis demonstrate the optimal number of prototypes and the best classifier on each dataset. It should be noted that the number of random prototypes in DS_Random and DS_RandomC are equals with optimal number of prototypes in proposed method.
Table 5
Classification accuracy of the proposed method and other basic dissimilarity spaces (± standard deviation)
Dataset
Proposed Algorithm
DS_All
DS_Random
DS_RandomC
Dataset
Iris
     
 Best (20,linSVM)
98.00(± 0.2)
96.00(± 2.6)
96.67(± 2.7)
96.67(± 2.7)
97.33(± 0.05)
 Average
96.33(± 1.7)
95.92(± 2.7)
96.17(± 2.8)
95.83(± 3.2)
95.92(± 0.04)
Glass
     
 Best(60,1-NN)
77.57(± 2.1)
73.83(± 6.2)
72.90(± 6.1)
72.43(± 5.3)
69.16(± 5.8)
 Average
71.61(± 2.4)
70.09(± 5.6)
69.68(± 5.8)
70.15(± 5.6)
67.00(± 7.5)
Ionosphere
     
 Best(50,5-NN)
95.73(± 1.1)
92.59(± 2.8)
92.59(± 2.2)
91.74(± 2.6)
84.33(± 3.6)
 Average
93.38(± 1.5)
93.36(± 2.1)
93.02(± 2.4)
92.45(± 3.6)
82.59(± 3.8)
Monk
     
 Best(60,polySVM2)
92.62(± 0.4)
90.98(± 1.2)
90.98(± 1.2)
90.98(± 1.3)
90.80(± 1.3)
 Average
89.55(± 0.9)
89.14(± 0.4)
87.86(± 1.1)
89.14(± 0.3)
82.07(± 0.8)
BreastCancer
     
 Best(50,11-NN)
94.20(± 0.8)
92.27(± 3.2)
91.56(± 3.9)
92.27(± 2.8)
96.49(± 0.5)
 Average
91.83(± 0.7)
91.55(± 0.4)
87.81(± 1.8)
89.26(± 1.2)
89.98(± 1.5)
Wine
     
 Best(10,linSVM)
74.72(± 1.2)
80.34(± 2.1)
73.03(± 3.5)
74.72(± 4.2)
96.07(± 0.5)
 Average
74.37(± 1.8)
72.89(± 3.2)
65.17(± 6.1)
67.98(± 4.5)
96.35(± 0.4)
USPS_nE
     
 Best(60,linSVM)
96.80(± 0.2)
94.40(± 0.01)
88.00(± 2.3)
87.60(± 3.1)
87.60(± 2.4)
 Average
93.55(± 1.5)
72.60(± 3.2)
70.50(± 3.5)
70.45(± 3.1)
82.55(± 2.4)
USPS_E
     
 Best(120,linSVM)
91.40(± 1.0)
89.80(± 0.8)
90.40(± 0.4)
89.00(± 0.7)
90.40(± 0.7)
 Average
88.03(± 0.1)
88.05(± 0.9)
87.05(± 0.7)
86.10(± 1.4)
85.80(± 0.4)
Digit5
     
 Best(10,linSVM)
99.50(± 0.1)
99.50(± 0.07)
97.50(± 0.7)
96.50(± 0.8)
99.50(± 0.00)
 Average
99.19(± 0.1)
99.25(± 0.9)
96.50(± 1.1)
96.75(± 0.7)
98.69(± 0.4)
MNIST
     
 Best(5000,1-NN)
94.05(± 3.5)
96.67(± 0.8)
95.67(± 0.4)
83.33(± 2.1)
97.30(± 0.2)
 Average
90.46(± 4.6)
94.21(± 0.9)
94.25(± 0.5)
82.50(± 2.3)
93.29(± 1.01)
Table 6
Classification F1-score of the proposed method and other basic dissimilarity spaces
Dataset
Proposed Algorithm
DS_All
DS_Random
DS_RandomC
Dataset
Iris
     
 Best (20,linSVM)
0.98(± 0.03)
0.95(0.03)
0.96(0.02)
0.96(0.02)
0.93(± 0.03)
 Average
0.95(± 0.03)
0.88(± 0.04)
0.88(± 0.03)
0.88(± 0.03)
0.94(± 0.02)
Glass
     
 Best(60,1-NN)
0.56(± 0.32)
0.56(± 0.3)
0.56(± 0.3)
0.54(± 0.3)
0.53(± 0.2)
 Average
0.50(± 0.3)
0.49(± 0.3)
0.46(± 0.3)
0.46(± 0.3)
0.45(± 0.3)
Ionosphere
     
 Best(50,5-NN)
0.94(± 0.03)
0.91(± 0.03)
0.91(± 0.03)
0.90(± 0.04)
0.82(± 0.1)
 Average
0.92(± 0.03)
0.92(± 0.03)
0.91(± 0.03)
0.90(± 0.03)
0.83(± 0.09)
Monk
     
 Best(60,polySVM2)
0.89(± 0.00)
0.88(± 0.00)
0.85(± 0.01)
0.93(± 0.00)
0.92(± 0.00)
 Average
0.87(± 0.01)
0.88(± 0.00)
0.88(± 0.00)
0.89(± 0.00)
0.81(± 0.00)
BreastCancer
     
 Best(50,11-NN)
0.91(± 0.03)
0.91(± 0.03)
0.90(± 0.03)
0.90(± 0.03)
0.90(± 0.03)
 Average
0.8(± 0.08)
0.82(± 0.08)
0.85(± 0.07)
0.83(± 0.07)
0.86(± 0.06)
Wine
     
 Best(10,linSVM)
0.69(± 0.17)
0.80(± 0.05)
0.70(± 0.13)
0.69(± 0.12)
0.96(± 0.02)
 Average
0.6(± 0.2)
0.62(± 0.17)
0.63(± 0.20)
0.61(± 0.20)
0.81(± 0.02)
USPS_nE
     
 Best(60,linSVM)
0.89(± 0.17)
0.85(± 0.2)
0.75(± 0.2)
0.78(± 0.2)
0.79(± 0.19)
 Average
0.83(± 0.2)
0.85(± 0.19)
0.80(± 0.24)
0.81(± 0.19)
0.76(± 0.18)
USPS_E
     
 Best(120,linSVM)
0.86(± 0.09)
0.89(± 0.06)
0.85(± 0.09)
0.83(± 0.09)
0.85(± 0.09)
 Average
0.78(± 0.11)
0.73(± 0.08)
0.81(± 0.11)
0.80(± 0.12)
0.83(± 0.08)
Digit5
     
 Best(10,linSVM)
0.98(± 0.01)
0.98(± 0.01)
0.95(± 0.01)
0.94(± 0.05)
0.98(± 0.01)
 Average
0.98(± 0.01)
0.92(± 0.02)
0.95(± 0.02)
0.95(± 0.03)
0.98(± 0.01)
MNIST
     
 Best(5000,1-NN)
0.85(± 0.02)
0.90(± 0.02)
0.89(± 0.03)
0.92(± 0.02)
0.98(± 0.00)
 Average
0.83(± 0.02)
0.91(± 0.01)
0.90(± 0.02)
0.91(± 0.01)
0.97(± 0.01)
Figures 6 and 7 demonstrate the comparison results of the proposed method and other dissimilarity spaces based on the best and the average accuracy, respectively.

Evaluating the LSM phase

In this section, the proposed method, is evaluated in terms of accuracy and runtime with and without incorporating the LSM approach. This makes it possible to determine the impact and importance of using the LSM on the classifier’s efficiency.
In Table 7, the best and average accuracy of the proposed method with and without LSM method are shown on all datasets and indicate the superiority of the proposed method over the method without LSM in most cases. It is due to the effect of the LSM on reducing the size of the dissimilarity space.
Table 7
Classification accuracy of the proposed method with and without LSM
Dataset
Proposed Algorithm
Proposed Algorithm without LSM
Accuracy(± std)
\({f}_{1}\)
Accuracy(± std)
\({f}_{1}\)
Iris
    
 Best (20,linSVM)
98.00(± 0.2)
0.98(± 0.03)
98.00(± 0.0)
0.96(± 0.03)
 Average
96.33(± 1.7)
0.95(± 0.03)
96.67(± 0.1)
0.96(± 0.03)
Glass
    
 Best(60,1-NN)
77.57(± 2.1)
0.56(± 0.32)
76.64(± 1.5)
0.50(± 0.30)
 Average
71.61(± 2.4)
0.50(± 0.3)
71.85(± 2.5)
0.42(± 0.60)
Ionosphere
    
 Best(50,5-NN)
95.73(± 1.1)
0.94(± 0.03)
91.45(± 1.3)
0.94(± 0.07)
 Average
93.38(± 1.5)
0.92(± 0.03)
93.16(± 0.2)
0.83(± 0.07)
Monk
    
 Best(60,polySVM2)
92.62(± 0.4)
0.89(± 0.00)
90.98(± 1.1)
0.87(± 0.00)
 Average
89.55(± 0.9)
0.87(± 0.01)
88.52(± 0.7)
0.86(± 0.01)
BreastCancer
    
 Best(50,11-NN)
94.20(± 0.8)
0.91(± 0.03)
94.20(± 0.1)
0.91(± 0.03)
 Average
91.83(± 0.7)
0.8(± 0.08)
91.40(± 0.6)
0.73(± 0.09)
Wine
    
 Best(10,linSVM)
74.72(± 1.2)
0.69(± 0.17)
75.28(± 0.5)
0.68(± 0.18)
 Average
74.37(± 1.8)
0.6(± 0.2)
73.53(± 1.1)
0.5(± 0.2)
USPS_nE
    
 Best(60,linSVM)
96.80(± 0.2)
0.89(± 0.17)
94.40(± 0.9)
0.90(± 0.11)
 Average
93.55(± 1.5)
0.83(± 0.2)
74.10(± 2.3)
0.85(± 0.17)
USPS_E
    
 Best(120,linSVM)
91.40(± 1.0)
0.86(± 0.09)
90.60(± 0.3)
0.80(± 0.12)
 Average
88.03(± 0.1)
0.78(± 0.11)
87.25(± 0.2)
0.79(± 0.14)
Digit5
    
 Best(10,linSVM)
99.50(± 0.1)
0.98(± 0.01)
99.50(± 0.0)
0.97(± 0.01)
 Average
99.19(± 0.1)
0.98(± 0.01)
99.06(± 0.1)
0.98(± 0.01)
MNIST
    
 Best(50,linSVM)
94.05(± 3.5)
0.85(± 0.02)
88.88(± 5.03)
0.75(± 0.03)
 Average
90.46(± 4.6)
0.83(± 0.02)
85.16(± 4.7)
0.73(± 0.02)
Figures 8 and 9 shows the comparison results of the proposed method with and without LSM based on the best and the average accuracy, respectively.
Also, in terms of runtime, as reported in Table 8, the proposed method requires less time for classification. In Figs. 10 and 11 comparison results between two cases are demonstrated more clearly. Since the classification runtime of MNIST dataset is not comparable to other datasets due its large scale, it is not shown in Figs. 10 and 11 and just is reported in Table 8.
Table 8
Classification runtime of the proposed method with and without LSM
Dataset
Proposed Algorithm
Proposed Algorithm without LSM
Iris
  
 Best (20,linSVM)
0.01
0.05
 Average
0.06
0.07
Glass
  
 Best(60,1-NN)
0.10
0.13
 Average
0.10
0.13
Ionosphere
  
 Best(50,5-NN)
0.15
0.17
 Average
0.13
0.17
Monk
  
 Best(60,polySVM2)
0.02
0.09
 Average
0.08
0.08
BreastCancer
  
 Best(50,11-NN)
0.40
0.45
 Average
0.47
0.53
Wine
  
 Best(10,linSVM)
0.10
0.15
 Average
0.11
0.10
USPS_nE
  
 Best(60,linSVM)
0.08
0.15
 Average
0.10
0.18
USPS_E
  
 Best(120,linSVM)
0.88
1.09
 Average
0.51
0.60
Digit5
  
 Best(10,linSVM)
0.06
0.08
 Average
0.09
0.10
MNIST
  
 Best(5000,1-NN)
10,000
12,000
 Average
11,200
13,300

Comparing with the recent approaches

In this section, the highest classification accuracy of the proposed method is compared with three latest papers on the same datasets. In 2015, Calvo et al. [12] proposed a two-stage algorithm to generate prototypes on structural data using dissimilarity space. First the structural data is embedded into the dissimilarity space and then the common prototype generation methods are applied. In 2017 a measure of ultrametricity is introduced for creating dissimilarity space [36]. Then the performance of KNN classifier on the transformed space was studied on Iris and Ionosphere datasets. Structured Sparse Dictionary Selection (SSDS) was proposed in 2017 by Wang et al. [24] as a novel prototype selection method. Table 9 shows that the proposed method outperforms the other three latest methods on the same datasets which besides the runtimes reported in Sect. 5.4.3 denotes the impact and effectiveness of the proposed dissimilarity space representation.
Table 9
Comparing the highest classification performance (accuracy in percent) of the proposed method with the recent similar approach (the values reported are the highest performance reported in each paper)
Methods
Iris
Ionosphere
USPS
Proposed method
98(± 0.2)(linSVM)
95.73(± 1.1)(5-NN)
96.80(± 1.0)(linSVM)
Calvo et al. [10]
86.70(1-NN)
Simovici et al. [46]
94.00(7-NN)
61.00(3-NN)
Wang et al. [24]
85.67(linSVM)

Conclusions

The dissimilarity space is a vector space which dimensions are determined by comparing differences between each sample in the training set and all members of the prototype set P. Therefore, prototype selection strategy and the size of the prototype set are so important and effective [1]. However, in previous works, the prototype set is identical for all samples [15, 25, 26], that is, all of them must necessarily be compared with all the members in the prototype set P. Also the neighboring of each sample and data structure on the manifold are not considered in prototype selection or mapping into dissimilarity space. Therefore, in this paper, a method is proposed to augment the dissimilarity space with manifold learning and increase the classification performance, considering the neighborhood of each sample on the manifold.
In the proposed method, first the data manifold is learned by the Linear Locally Embedding algorithm (LLE) [32, 34]. Then an undirected weighted graph is constructed according to the data manifold obtained by the LLE algorithm and the geodesic distance matrix is computed. The k-nearest neighbors of each sample on the manifold using the geodesic distance are selected as members of the prototype set P. In the proposed process of constructing the dissimilarity space, it is enough to compare each sample with its k-nearest neighbors on the manifold, and not all the members of the prototype set P. This is an effective factor in reducing the computational complexity. In next step the Latent Space Model simplifies and reduces dimensions of dissimilarity space more and embeds all samples to the Euclidean latent space which has a lower dimensionality than the constructed dissimilarity space [36]. Finally, the SVM and KNN classifiers are trained on the augmented dissimilarity space and the proposed method is evaluated in terms of classification accuracy and runtime. Advantages of the proposed method are as follows:
  • Creating the prototype set P dynamically and unevenly for all samples, so that each sample is not required to be compared with all the prototypes in the set P, but only compared with its k-nearest neighbors on the manifold and then the dissimilarity space is created. As a result, this has a great effect on reducing the computational complexity.
  • The proposed complete framework, in comparison with the method without LSM, requires less classification runtime, which indicates the positive effect of the LSM model on complexity reduction of the augmented dissimilarity space.
  • The proposed framework outperforms the accuracy of the latest similar approaches.
  • In most scenarios, the linear SVM classifier on the proposed method can achieve the highest accuracy.
The algorithm for estimating and embedding the dissimilarity space into the latent space is time-consuming. Therefore, using alternative algorithms as the future works can improve the runtime of the proposed method. Also, in the LSM method, determining the optimal number of latent space dimensions and the number of iterations for the estimation step will greatly influence the efficiency of the model and classifiers. Also, other dimension reduction or mapping techniques to the latent space can be used for future works.

Acknowledgements

All the contributors are mentioned in the manuscript.

Declarations

This manuscript does not include studies involving human participants.
This manuscript does not contain any individual person’s data.

Competing interests

The authors approve that there is no conflict of interest in the manuscript.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literature
1.
go back to reference Pękalska E, Duin RPW. The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. Singapore: World Scientific; 2005. p. 607. Pękalska E, Duin RPW. The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. Singapore: World Scientific; 2005. p. 607.
2.
go back to reference Xu W. Non-Euclidean Dissimilarity Data in Pattern Recognition. In: Department of Computer Science. 2013, University of York. Xu W. Non-Euclidean Dissimilarity Data in Pattern Recognition. In: Department of Computer Science. 2013, University of York.
3.
go back to reference Duin RPW, Pękalska E. The dissimilarity space: Bridging structural and statistical pattern recognition. Pattern Recogn Lett. 2012;33(7):826–32.CrossRef Duin RPW, Pękalska E. The dissimilarity space: Bridging structural and statistical pattern recognition. Pattern Recogn Lett. 2012;33(7):826–32.CrossRef
4.
go back to reference Goldfrab L, Kanal L, Rosenfeld A. A new approach to pattern recognition. Prog Pattern Recognition. 1985;2:241–402.MathSciNet Goldfrab L, Kanal L, Rosenfeld A. A new approach to pattern recognition. Prog Pattern Recognition. 1985;2:241–402.MathSciNet
5.
go back to reference Pękalska E, Duin RPW. Beyond Traditional Kernels: Classification in Two Dissimilarity-Based Representation Spaces. IEEE Trans Syst Man Cybern. 2008;38(6):729–44.CrossRef Pękalska E, Duin RPW. Beyond Traditional Kernels: Classification in Two Dissimilarity-Based Representation Spaces. IEEE Trans Syst Man Cybern. 2008;38(6):729–44.CrossRef
6.
go back to reference Nanni L, et al. Experiments of image classification using dissimilarity spaces built with siamese networks. Sensors. 2021;21(5):1573.CrossRef Nanni L, et al. Experiments of image classification using dissimilarity spaces built with siamese networks. Sensors. 2021;21(5):1573.CrossRef
7.
go back to reference Nanni L, et al. Spectrogram Classification Using Dissimilarity Space. Appl Sci. 2020;10(12):4176.CrossRef Nanni L, et al. Spectrogram Classification Using Dissimilarity Space. Appl Sci. 2020;10(12):4176.CrossRef
8.
go back to reference Duin RPW, Pękalska E. Zero-error dissimilarity based classifiers. arXiv preprint arXiv:1601.04451. 2016. Duin RPW, Pękalska E. Zero-error dissimilarity based classifiers. arXiv preprint arXiv:1601.04451. 2016.
9.
go back to reference Duin RPW, Pękalska E, Loog M, Non-Euclidean Dissimilarities: Causes, Embedding and Informativeness, in Similarity-Based Pattern Analysis and Recognition, M. Pelillo, editors. Springer. London: London; 2013. p. 13–44. Duin RPW, Pękalska E, Loog M, Non-Euclidean Dissimilarities: Causes, Embedding and Informativeness, in Similarity-Based Pattern Analysis and Recognition, M. Pelillo, editors. Springer. London: London; 2013. p. 13–44.
10.
go back to reference Eskander GS, Sabourin R, Granger E. On the dissimilarity representation and prototype selection for signature-based bio-cryptographic systems. In: Hancock E, Pelillo M, Ed. Similarity-Based Pattern Recognition: Second International Workshop, SIMBAD 2013, York, UK, July 3–5, 2013. 2013, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 265–280. Eskander GS, Sabourin R, Granger E. On the dissimilarity representation and prototype selection for signature-based bio-cryptographic systems. In: Hancock E, Pelillo M, Ed. Similarity-Based Pattern Recognition: Second International Workshop, SIMBAD 2013, York, UK, July 3–5, 2013. 2013, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 265–280.
11.
go back to reference Bunke H, Riesen K. Towards the unification of structural and statistical pattern recognition. Pattern Recogn Lett. 2012;33(7):811–25.CrossRef Bunke H, Riesen K. Towards the unification of structural and statistical pattern recognition. Pattern Recogn Lett. 2012;33(7):811–25.CrossRef
12.
go back to reference Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR. Prototype generation on structural data using dissimilarity space representation: a case of study. Pattern Recogn Image Analysis. 2015;9117:75–82.CrossRef Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR. Prototype generation on structural data using dissimilarity space representation: a case of study. Pattern Recogn Image Analysis. 2015;9117:75–82.CrossRef
13.
go back to reference Iwana BK, et al. Efficient temporal pattern recognition by means of dissimilarity space embedding with discriminative prototypes. Pattern Recogn. 2017;64:268–76.CrossRef Iwana BK, et al. Efficient temporal pattern recognition by means of dissimilarity space embedding with discriminative prototypes. Pattern Recogn. 2017;64:268–76.CrossRef
14.
go back to reference Theodorakopoulos I, et al. HEp-2 cells classification via sparse representation of textural features fused into dissimilarity space. Pattern Recogn. 2014;47(7):2367–78.CrossRef Theodorakopoulos I, et al. HEp-2 cells classification via sparse representation of textural features fused into dissimilarity space. Pattern Recogn. 2014;47(7):2367–78.CrossRef
15.
go back to reference Garcia V, Sanchez JS. Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci. 2015;294:362–75.MathSciNetCrossRef Garcia V, Sanchez JS. Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci. 2015;294:362–75.MathSciNetCrossRef
16.
go back to reference Avesani P, et al. Tractography Mapping for Dissimilarity Space across Subjects. In: 2015 International Workshop on Pattern Recognition in NeuroImaging. 2015. Avesani P, et al. Tractography Mapping for Dissimilarity Space across Subjects. In: 2015 International Workshop on Pattern Recognition in NeuroImaging. 2015.
17.
go back to reference Paclı́k P, Duin RPW. Dissimilarity-based classification of spectra: computational issues. Real-Time Imaging. 2003; 9(4): 237–244. Paclı́k P, Duin RPW. Dissimilarity-based classification of spectra: computational issues. Real-Time Imaging. 2003; 9(4): 237–244.
18.
go back to reference Jain B, Spiegel S. Dimension Reduction in Dissimilarity Spaces for Time Series Classification. In: Douzal-Chouakria A, Vilar JA, Marteau P-F, editors. Advanced Analysis and Learning on Temporal Data: First ECML PKDD Workshop, AALTD 2015, Porto, Portugal, September 11, 2015, Revised Selected Papers. Cham: Springer International Publishing; 2016. p. 31–46.CrossRef Jain B, Spiegel S. Dimension Reduction in Dissimilarity Spaces for Time Series Classification. In: Douzal-Chouakria A, Vilar JA, Marteau P-F, editors. Advanced Analysis and Learning on Temporal Data: First ECML PKDD Workshop, AALTD 2015, Porto, Portugal, September 11, 2015, Revised Selected Papers. Cham: Springer International Publishing; 2016. p. 31–46.CrossRef
19.
go back to reference Bunke H, Riesen K. Graph Classification Based on Dissimilarity Space Embedding, in Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4–6, 2008. In: Proceedings, N. da Vitoria Lobo, et al., Editors. 2008, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 996–1007. Bunke H, Riesen K. Graph Classification Based on Dissimilarity Space Embedding, in Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4–6, 2008. In: Proceedings, N. da Vitoria Lobo, et al., Editors. 2008, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 996–1007.
20.
go back to reference Livi L. Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation. arXiv preprint arXiv:1408.5286; 2014. Livi L. Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation. arXiv preprint arXiv:1408.5286; 2014.
21.
go back to reference Livi L, Rizzi A, Sadeghian A. Optimized dissimilarity space embedding for labeled graphs. Inf Sci. 2014;266:47–64.MathSciNetCrossRef Livi L, Rizzi A, Sadeghian A. Optimized dissimilarity space embedding for labeled graphs. Inf Sci. 2014;266:47–64.MathSciNetCrossRef
22.
go back to reference Mekhazni D, et al. Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification. Cham: Springer International Publishing; 2020.CrossRef Mekhazni D, et al. Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification. Cham: Springer International Publishing; 2020.CrossRef
23.
go back to reference Uddin MK, et al. Fusion in dissimilarity space for RGB-D person re-identification. In: Array. 2021. p. 100089. Uddin MK, et al. Fusion in dissimilarity space for RGB-D person re-identification. In: Array. 2021. p. 100089.
24.
go back to reference Wang H, et al. Representative Selection with Structured Sparsity. Pattern Recogn. 2017;63(3):268–78.CrossRef Wang H, et al. Representative Selection with Structured Sparsity. Pattern Recogn. 2017;63(3):268–78.CrossRef
25.
go back to reference Pękalska E, Duin RPW, Paclík P. Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 2006;39(2):189–208.CrossRef Pękalska E, Duin RPW, Paclík P. Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 2006;39(2):189–208.CrossRef
26.
go back to reference Calana YP, et al. Prototype Selection for Dissimilarity Representation by a Genetic Algorithm. In: 2010 20th International Conference on Pattern Recognition. 2010. Calana YP, et al. Prototype Selection for Dissimilarity Representation by a Genetic Algorithm. In: 2010 20th International Conference on Pattern Recognition. 2010.
27.
go back to reference Bernhauer D, Skopal T. Approximate search in dissimilarity spaces using GA. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2019, Association for Computing Machinery: Prague, Czech Republic. p. 279–280. Bernhauer D, Skopal T. Approximate search in dissimilarity spaces using GA. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2019, Association for Computing Machinery: Prague, Czech Republic. p. 279–280.
28.
go back to reference Silva LA, de Vasconcelos BP, Del-Moral-Hernandez E. A model to estimate the Self-Organizing Maps grid dimension for Prototype Generation. Intelligent Data Anal. 2021;25:321–38.CrossRef Silva LA, de Vasconcelos BP, Del-Moral-Hernandez E. A model to estimate the Self-Organizing Maps grid dimension for Prototype Generation. Intelligent Data Anal. 2021;25:321–38.CrossRef
29.
go back to reference Devi VS, Kuruvilla SA, Aparna R. Prototype Selection and Dimensionality Reduction on Multi-Label Data. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 2020, Association for Computing Machinery Hemavati: Hyderabad, India. p. 195–199. Devi VS, Kuruvilla SA, Aparna R. Prototype Selection and Dimensionality Reduction on Multi-Label Data. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 2020, Association for Computing Machinery Hemavati: Hyderabad, India. p. 195–199.
30.
go back to reference Sisodia D, Sisodia DS. Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Eng Sci Technol Int J. 2021;9:78. Sisodia D, Sisodia DS. Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Eng Sci Technol Int J. 2021;9:78.
31.
go back to reference Fani SM, Boltenhagen M, van der Aalst W. Prototype Selection Using Clustering and Conformance Metrics for Process Discovery. Cham: Springer International Publishing; 2020. Fani SM, Boltenhagen M, van der Aalst W. Prototype Selection Using Clustering and Conformance Metrics for Process Discovery. Cham: Springer International Publishing; 2020.
32.
go back to reference Saul LK, Roweis ST. An Introduction to Locally Linear Embedding. 2000. Saul LK, Roweis ST. An Introduction to Locally Linear Embedding. 2000.
33.
go back to reference Wilson RC. Similarities, Distances and Manifold Learning. 2012. Wilson RC. Similarities, Distances and Manifold Learning. 2012.
34.
go back to reference Ventura D. Manifold Learning Examples - PCA, LLE and ISOMAP. 2008. Ventura D. Manifold Learning Examples - PCA, LLE and ISOMAP. 2008.
35.
go back to reference Goddard W, Oellermann OR. Distance in Graphs. In: Dehmer M, editor. Structural Analysis of Complex Networks. Boston: Birkhäuser Boston; 2011. p. 49–72.CrossRef Goddard W, Oellermann OR. Distance in Graphs. In: Dehmer M, editor. Structural Analysis of Complex Networks. Boston: Birkhäuser Boston; 2011. p. 49–72.CrossRef
36.
go back to reference Hoff PD, Raftery AE, Handcock MS. Latent space approaches to social network analysis. J Am Stat Assoc. 2002;97(460):1090–8.MathSciNetCrossRef Hoff PD, Raftery AE, Handcock MS. Latent space approaches to social network analysis. J Am Stat Assoc. 2002;97(460):1090–8.MathSciNetCrossRef
37.
go back to reference Dustin T, et al. Edward A library for probabilistic modeling, inference, and criticism. arXiv preprint arXiv:1610.09787; 2016. Dustin T, et al. Edward A library for probabilistic modeling, inference, and criticism. arXiv preprint arXiv:1610.09787; 2016.
38.
go back to reference Dheeru DAKT. UCI Machine Learning Repository, I. California: University of California, School of Information and Computer Sciences; 2017. Dheeru DAKT. UCI Machine Learning Repository, I. California: University of California, School of Information and Computer Sciences; 2017.
39.
go back to reference Hull J. A database for handwritten text recognition research. IEEE Trans Pattern Anal. 1994;16(5):550–4.CrossRef Hull J. A database for handwritten text recognition research. IEEE Trans Pattern Anal. 1994;16(5):550–4.CrossRef
40.
go back to reference LeCun Y, Cortes C. MNIST handwritten digit database. 2010. LeCun Y, Cortes C. MNIST handwritten digit database. 2010.
41.
go back to reference Keysers D, et al. Adaptation in statistical pattern recognition using tangent vectors. IEEE Trans Pattern Anal Mach Intell. 2004;26(2):269–74.CrossRef Keysers D, et al. Adaptation in statistical pattern recognition using tangent vectors. IEEE Trans Pattern Anal Mach Intell. 2004;26(2):269–74.CrossRef
44.
go back to reference van Rijsbergen CJ. Information retrieval. 2nd ed. 1979. van Rijsbergen CJ. Information retrieval. 2nd ed. 1979.
45.
go back to reference Kurasova O, Dzemyda G. Selection of the number of neighbours of each data point for the Locally Linear Embedding Algorithm. Vol. 36; 2007. Kurasova O, Dzemyda G. Selection of the number of neighbours of each data point for the Locally Linear Embedding Algorithm. Vol. 36; 2007.
46.
go back to reference Simovici DA, Vetro R, Hua K. Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining. In: Guillet F, Pinaud B, Venturini G, editors. Advances in Knowledge Discovery and Management, vol. 6. Cham: Springer International Publishing; 2017. p. 141–55.CrossRef Simovici DA, Vetro R, Hua K. Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining. In: Guillet F, Pinaud B, Venturini G, editors. Advances in Knowledge Discovery and Management, vol. 6. Cham: Springer International Publishing; 2017. p. 141–55.CrossRef
47.
go back to reference Mauceri S, Sweeney J. Dissimilarity-based representations for one-class classification on time series. Pattern Recogn. 2020;100:107122.CrossRef Mauceri S, Sweeney J. Dissimilarity-based representations for one-class classification on time series. Pattern Recogn. 2020;100:107122.CrossRef
Metadata
Title
Dissimilarity space reinforced with manifold learning and latent space modeling for improved pattern classification
Authors
Azadeh Rezazadeh Hamedani
Mohammad Hossein Moattar
Yahya Forghani
Publication date
01-12-2021
Publisher
Springer International Publishing
Published in
Journal of Big Data / Issue 1/2021
Electronic ISSN: 2196-1115
DOI
https://doi.org/10.1186/s40537-021-00527-6

Other articles of this Issue 1/2021

Journal of Big Data 1/2021 Go to the issue

Premium Partner