1 Introduction
Classification learning still belongs to the main tasks in machine learning [
5]. Although powerful methods are available, still there is need for improvements and search for alternatives to the existing strategies. A huge progress was achieved by the realization of deep networks [
15,
28]. These networks overcame the hitherto dominating support vector machines (SVM) in classification learning [
11,
51]. However, deep architectures have the disadvantage that the interpretability is at least difficult. Therefore, great effort is currently spent to explain deep models, see [
40] and references therein. However, due to the complexity of deep networks this is quite often impossible [
70]. Thus, alternatives are required for many applications like in medicine [
39]. A promising alternative is the concept of distance-based methods like in learning vector quantizations [
57], where prototypes play the role to be references for data [
25,
34,
37]. Further, prototype layers in deep networks also seem to stabilize the behavior deep models [
29].
Learning vector quantizers (LVQ) are sparse models for prototype-based classification learning, which were heuristically introduced by T. Kohonen [
23,
24]. LVQ is a competitive learning algorithm relying on an attraction and repulsion scheme (ARS) for prototype adaptation, which can be described geometrically in case of the Euclidean setting. Keeping the idea of ARS but take the LVQ to the next level is the generalized LVQ (GLVQ), which optimizes a loss function approximating the classification error [
42]. Standard training is stochastic gradient descent learning (SGDL) based on the local losses.
Of particular interest for potential users is the advantage of easy interpretability of LVQ networks according to the prototype principle [
63]. Further, LVQ networks belong to the class of classification margin optimizers like SVM [
10] and are proven to be robust against adversarial attacks [
41].
In standard LVQ networks, the distance measure is set to be the squared Euclidean one, yet other proximities can be applied [
4]. For example, in case of spectral data or histograms, divergences are beneficial [
21,
31,
58], whereas functional data may benefit from similarities including aspects of the slope like Sobolev distances [
60]. Even the kernel trick successfully applied in SVM can be adopted to the GLVQ, denoted as kernelized GLVQ (KGLVQ [
59], i.e., the data and prototypes are implicitly mapped into a potentially infinite-dimensional Hilbert space but compared by means of the kernel-based distance calculated in the low-dimensional data space [
45]. For reproducing kernel Hilbert spaces, the respective mapping is unique [
52,
53]. This so-called feature mapping frequently is nonlinear depending on the kernel in use [
17].
However, SGDL training for KGLVQ requires differentiability of the kernel. Otherwise, median variants of GLVQ have to be considered, i.e., the prototype is restricted to be data points and the optimization goes back to likelihood optimization [
32]. Replacing the standard Euclidean distance by nonlinear kernel distances can improve the performance of the GLVQ model as it is known for the SVM. Yet, the maybe improved performance of KGLVQ comes with a weaker interpretability, because the implicit kernel mapping does not allow to observe the mapped data directly in the feature space.
Another way to accelerate the usually time-consuming training process in machine learning models is to make use of efficient quantum computing algorithms [
9,
12,
33]. For supervised and unsupervised problems, several approaches are developed [
3,
46,
49,
50,
64]. A recent overview for recently developed quantum-inspired approaches can be found in [
26].
Quantum support vector machines are studied in [
36], Hopfield networks are investigated in [
35]. A quantum perceptron algorithm is proposed in [
66]. Particularly, nearest neighbor approaches are studied in [
19,
65]. Related to vector quantization, the
c-means algorithm is considered in [
22,
69], which can be also seen in connection to quantum classification algorithms based on competitive learning [
71].
In this contribution, we propose an alternative nonlinear data processing but somewhat related to kernel GLVQ keeping the idea to map the data nonlinearly into a particular Hilbert space. In particular, we transform the data vectors nonlinearly into
quantum state vectors and require at the same time that the prototypes are always quantum state vectors, too [
48,
61]. The respective quantum space also is a Hilbert space, and hence, we can take the data mapping as some kind of feature mapping like for kernels [
16,
56]. The adaptation has to ensure the attraction and repulsing strategy known to be the essential ingredients of LVQ algorithms, but has to be adapted here to quantum state space properties. The latter restriction requires the adaptations to be unitary transformations. The resulting algorithm is denoted as
Quantum-inspired GLVQ (Qu-GLVQ).
In fact, quantum-inspired machine learning algorithms seem to be a promising approach to improve effectiveness of algorithms with respect to time complexity [
55]. Yet, here the benefit would be twice because we also emphasize the mathematical similarity to kernel-based learning, which still is one of the most successful strategies in machine learning [
53,
59]. However, a realization on a real quantum system is not in the focus of this paper and, hence, left for future work. The focus of this paper is clearly to show the mathematical similarities between kernel and quantum approaches.
The paper is structured as follows: Starting with a short introduction of standard GLVQ for real and complex data [
14], we briefly explain the use of real and complex kernels in context of GLVQ. Again, we consider both the real and the complex case. After these preliminaries, we give the basic mathematical properties of quantum state spaces and incorporating these concepts into GLVQ. We show that this approach is mathematically consistent with the kernel GLVQ. Numerical simulations exemplary show the successful application of the introduced Qu-GLVQ.
Further, we emphasize that the aim of the paper is not to show any improvement in quantum-inspired GLVQ compared to kernel approaches or standard GLVQ. Instead, one goal is to show that both the quantum and the kernel approach are mathematically more or less equivalent while kernel approaches apply implicit mapping and the quantum approach does the mapping explicitly.
4 Numerical experiments
We tested the proposed Qu-GLVQ algorithm for several datasets in comparison with established LVQ approaches as well as SVM. Particularly, we compare Qu-GLVQ with standard GLVQ, KGLVQ and SVM. For both SVM and KGLVQ, the rbf-kernel was used with the same kernel width obtained by a grid search. For all LVQ variants including Qu-GLVQ, we used only one prototype per class for all experiments.
The datasets are a) WHISKY - a spectral dataset to classify Scottish whisky described in [
1,
2], WBCD - UCI Wisconcin Breast Cancer Dataset, HEART - UCI Heart disease dataset, PIMA - UCI diabetes dataset, and FLC1 - satellite remote sensing LANDSAT TM dataset [
27].
2 The averaged results reported are obtained by 10-fold cross-validation. Only test accuracies are given.
Table 1
Classification accuracies in % and standard deviations for GLVQ, Qu-GLVQ, KGLVQ, and SVM for the considered datasets (10-fold cross-validation)
WHISKY | 1188 | 401 | 3 | 92.5 ± 1.9 | 92.6 ± 1.4 | 86.9 ± 2.8 | 99.3 ± 0.7 (|SV|:140) |
WBCD | 569 | 30 | 2 | 96.8 ± 2.7 | 96.1 ± 3.4 | 94.0 ± 3.4 | 96.5 ± 1.5 (|SV|: 43) |
HEART | 297 | 13 | 2 | 56.9 ± 4.0 | 81.8 ± 5.3 | 84.3 ± 5.4 | 80.4 ± 9.5 (|SV|: 97) |
FLC1 | 82045 | 12 | 10 | 92.8 ± 0.3 | 92.0 ± 0.4 | 94.8 ± 0.4 | 97.4 ± 0.1 (|SV|: 2206) |
PIMA | 768 | 8 | 2 | 77.3 ± 2.3 | 75.4 ± 2.4 | 76.4 ± 4.6 | 75.5 ± 4.1 (|SV|: 311) |
Average | | | | 83.3 ± 2.2 | 87.6 ± 2.6 | 87.3 ± 3.3 | 89.8 ± 3.2 |
We observe an overall good performance of QU-GLVQ comparable to the other methods, see Table
1. Kernel methods seem to be beneficial as indicated by KGLVQ and SVM for HEART. Yet, Qu-GLVQ delivers similar results. If SVM yields significant better performance than KLVQ and Qu-GLVQ, then we have to take into account that here the SVM complexity (number of support vectors) is much higher than in LVQ-networks, where the number of prototypes was chosen always to be only one per class. Further, for WHISKY the KGLVQ was not able to achieve a classification accuracy comparable to the other approaches, whereas Qu-GLVQ performed well. This might be addressed due to the crucial dependency of the rbf-kernel on the kernel width. Further, Qu-GLVQ seems to be more stable than KGLVQ and SVM considering the averaged deviations.
5 Conclusion
In this contribution, we introduced an approach to incorporate quantum machine learning strategies into the GLVQ framework. Usual data and prototype vectors are replaced by respective quantum bit vectors. This replacement can be seen as an explicit nonlinear mapping of the data into the quantum Hilbert space which make the difference to the implicit feature mapping in case of kernel methods like kernelized GLVQ or SVM. Thus, one can visualize and analyze the data as well as the prototypes in this space, such that Qu-GLVQ becomes better interpretable than KGLVQ or SVM without this possibility.
Otherwise, the QU-GLVQ approach shows mathematical equivalence to the kernel approaches in topological sense: the distance calculations are carried out in the mapping Hilbert space implicitly for usual kernels and explicitly for Qu-GLVQ. The resulting adaptation dynamic in Qu-GLVQ is consistent with the unitary transformations required for quantum state changes, because the prototypes remain quantum-state vectors.
Further investigations should include several modifications and extension of the proposed Qu-GLVQ. First, matrix learning like for GMLVQ can easily be integrated. Further, the influence of different transfer function as proposed in [
62] for standard GLVQ has to be considered to improve learning speed and performance. A much more challenging task will be to consider entanglements for qubits and complex amplitudes
\(\beta \in \mathbb {C}\) for qubits [
20] in context of Qu-GLVQ. This research also should continue the approach in [
71].
Finally, adaptation to a real quantum system is the final goal as recently realized for other classifier systems [
47].
Overall, Qu-GLVQ performs equivalently to KGLVQ and SVM. However, due to its better interpretability it could be an interesting alternative if interpretable models are demanded.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.