nach oben

Complex & Intelligent Systems

Erschienen in:

Open Access 21.09.2022 | Original Article

Fusing depth local dual-view features and dual-input transformer framework for improving the recognition ability of motion artifact-contaminated electrocardiogram

verfasst von: Shuaiying Yuan, Ziyang He, Jianhui Zhao, Zhiyong Yuan

Erschienen in: Complex & Intelligent Systems | Ausgabe 1/2023

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Heart health monitoring based on wearable devices is often contaminated by various noises to varying degrees. Using signal quality indicators (SQIs) to achieve signal quality assessment (SQA) is among the most promising ways to solve this problem, but the performance of SQIs in expressing ECG quality features contaminated by motion artifact (MA) noise remains disappointing. Here, we present a novel SQA method that fuses the proposed depth local dual-view (DLDV) features and the dual-input transformer (DI-Transformer) framework to improve the recognition ability of MA-contaminated ECGs. The proposed DLDV features are to identify subtle differences between MA and ECG through depth local amplitude and phase angle features. When it fuses with the temporal relationship features extracted by DI-Transformer, its accuracy is significantly improved compared to the SQIs-based methods. In addition, we also verify the robustness and the accuracy of DLDV features on four traditional classifiers. Finally, we conduct our experiments on the two datasets. On the PhysioNet/Computing in Cardiology Challenge dataset, the DLDV features (Acc = 95.49%) outperform the combination of six SQIs features (Acc = 91.26%). When combined with our DI-Transformer, it delivered an accuracy of 99.62%, outperforming the state-of-the-art SQA methods. On the artificial testset constructed by MA noise, our DI-Transformer outperforms four traditional methods and also delivered an accuracy of 97.69%.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Traditional electrocardiograms (ECG) analysis usually requires doctors to diagnose and treat based on the patient’s ECG wave information. However, ECGs recorded by wearable devices are commonly contaminated by various noises. Especially contaminated by noise such as motion artifacts [MA: muscle artifact (ma) and electrode motion artifact (em)], resulting in a large number of poor quality signals, and its existence seriously hinders the doctors’ diagnosis and delays patients’ timely treatment. To make matters worse, some MA frequency details overlap with the band of the ECG signals, thus limiting the filtering methods in the frequency domain, or have similar morphology to some ECG signals, thus limiting the filtering methods in the time domain [1]. It is challenging to eliminate these noises without distorting the clinical features [2].

In general, there are two ways to solve this problem. The first is to use denoising techniques [3‐6], which have good effect on baseline wander, high-frequency noise, etc., but it is difficult to remove the MA noise mentioned above. Another way is to eliminate signals heavily contaminated by the MA through the signal quality assessment (SQA) [7, 8]. Currently, the mainstream SQA methods can be roughly divided into two categories. The first category is based on traditional machine learning and signals quality indicators (SQIs) [9‐14]. For example, Xia et al. proposed an ECG SQA method based on support vector machine (SVM) and multi-feature fusion with waveform attributes, power spectrum, R-wave detection, and other characteristics [9]. Behar et al. employed indicators such as kSQI, sSQI, pSQI basSQI, bSQI, pcaSQI, and rSQI, and trained an SVM model to evaluate the quality of ECG signals to reduce false alarms [10]. Satija et al. calculated the SQIs through signal loss detection, baseline mutation extraction, and high-frequency noise detection and extraction to evaluate the clinical acceptability of ECG signals [11]. Zhang et al. adopted waveform feature-based methods (including lead-off features, baseline wander features, power spectral features, and nonlinear features) to train random forest and SVM model for SQA [12]. Shahriari et al. used a structural similarity measure (SSIM) to compare ECG images obtained from two ECGs at standard scales. Then, a representative subset of ECG images are selected from the training set as a template by a clustering method. Finally, the SSIM between each image and all templates is used as features to train a linear discriminant analysis classifier for SQA [13]. Holzinger et al. provided a taxonomy of various entropy methods, whereby describing in more detail: approximate entropy, sample entropy, fuzzy entropy, and particularly topological entropy for finite sequences. They also state that entropy measures have successfully been tested for analyzing short, sparse and noisy time series data [14]. These hand-crafted features have the advantage of interpretability and can reflect the specific description of ECG features to a certain extent. However, these SQIs are based on human-defined desirable properties of clean signals, it relies on human-specified properties, leading to inherent limitations in expressing potential features of signal quality. Simultaneously, they rarely consider the effective ECG feature extraction methods under the condition of MA interference.

The second category is deep learning-based methods [15‐18], which usually utilize abstract features extracted by deep learning techniques or combine them with hand-crafted features to implement SQA. For instance, Liu et al. proposed a new method that combines deep learning-based Stockwell Transform (S-Transform) spectrogram features and hand-crafted statistical features to achieve SQA [15]. Huerta et al. combined convolutional neural networks and wavelet transform to robustly identify high-quality ECG segments in the challenging setting of single-lead recordings of alternating sinus rhythms, atrial fibrillation episodes, and other rhythms [16]. Seeuws et al. used an unsupervised deep learning model to derive a data-driven quality metric that outperformed some traditional metrics (kSQI, sSQI, IOR, pSQI, basSQI, bSQI, and pcaSQI) and highlight the consistently superior performance of their metrics across different tasks [17]. Zhang et al. designed a comprehensive feature-set (covering spectral distribution, signal complexity, horizontal and vertical variations of waves, etc.) and utilized two long short-term memory (LSTM) layers to learn time-dependent features automatically [18]. Compared with hand-crafted features, the abstract features extracted based on deep learning methods describe ECG recordings from another magical perspective. But they seldom consider effective solutions to the problem of MA interference that have similar morphology and band aliasing to some ECG signal. In addition, they also rarely propose interpretability and relationships between these features.

Here, we mainly solve two problems: (1) noise such as MA with similar morphology and aliased frequency bands to some ECG can easily deceive machine learning methods, resulting in low accuracy of SQA. (2) The hand-crafted features require sufficient human intervention and cannot express signal quality comprehensively. We propose a novel SQA method that fuses depth local dual-view (DLDV) features and a dual-input Transformer (DI-Transformer) framework for improving the recognition ability of MA-contaminated ECG. Specifically, we extract the first three intrinsic mode functions (FT-IMF) of the signal through empirical mode decomposition (EMD) [19] and then employ fast fourier transform (FFT) [20] to further explore the deeper local amplitude and phase angle features of FT-IMF. Then, the DLDV features are dimensionally reduced by kernel principal component analysis (KPCA) [21] and employed to identify subtle differences between MA and ECG signal through depth local amplitude and phase angle features. At the same time, we also analyze the FT-IMF’s central tendency and dispersion degree and combine the result with dimensionality reduced DLDV features to form augmented features (FT-IMF$_\mathrm{all}$). Finally, the FT-IMF$_\mathrm{all}$ is fused with the temporal relational features extracted from Raw ECG by the proposed DI-Transformer framework to train the SQA model. In particular, the phase angle features we extracted contains the contribution of each time sample point. So it can quantify the subtle changes in ECGs at the time sample point. Naturally, it can also distinguish the nuances of ECGs and MA. As far as we know, there has no literature on extracting the DLDV features (phase angle and amplitude–frequency features) from FT-IMF to achieve SQA. Only Lee et al. calculated the mean, variance, and Shannon entropy from the first IMF (F-IMF) obtained by EMD, and then used them for SQA [22]. These indicators can reflect the signal’s central tendency and dispersion degree but cannot fully reflect the deeper local features used to distinguish the MA noise. Because the feature information computed by their method loses the temporal features. In this paper, the DLDV features extracted through FT-IMF not only can solve the problem that traditional methods cannot obtain the iconic features of the MA, but also have the advantage of interpretability. We also verify the accuracy and robustness of DLDV features on four traditional classifiers and provide an accurate and efficient SQA scheme based on K-Nearest Neighbor (KNN). In addition, our proposed DI-Transformer model is based on the transformer [23] architecture, which has the advantage that the multi-head attention module can be executed in parallel and can capture the temporal relationship of ECG signal. Our combined strategy with the transformer model can overcome the shortcomings of traditional machine learning requiring full human intervention while accurately distinguishing MA noise from ECGs. The contributions of this study can be summarized as follows:

The proposed DLDV features can identify subtle differences between MA and ECG signals through depth local amplitude and phase angle features, which provides a practical and novel solution for identifying MA-contaminated ECGs.
The proposed DI-Transformer can focus on the temporal relationship between sample points and reflect the subtle local changes in the signal sequence, which can effectively improve the model’s ability to identify MA-contaminated ECG.
The strategy of fusion the DLDV features and DI-Transformer’s temporal relational features extracted from Raw ECG significantly improves the accuracy of MA noise recognition and has applications such as wearable ECG monitoring devices.
For the first time, we propose the DLDV features to solve the ECG SQA problem and achieve an accuracy of 94.27% on G-SVM and 93.32% on KNN, and the result outperforms six traditional SQIs. More importantly, we obtain the best accuracy (99.62%) on the proposed DI-Transformer, which outperforms other state-of-the-art SQA methods.

This paper is organized as follows: “Methodology” presents the data used in the experiments and the details of our method. “Experiments and results” demonstrates the experimental results. Finally, we discuss and conclude our work in “Discussion” and “Conclusion”.

Methodology

We present the overall framework of the proposed SQA method in Fig. 1. It mainly consists of three parts: data preprocessing, DLDV features extraction and KPCA, and DI-Transformer framework. Among them, the DI-Transformer framework also consists of two parts: transformer encoder layer and classification layer. Next, we will describe each part in detail in the following sections.

The DLDV features extraction and KPCA

DLDV features extraction

We start our DLDV feature extraction method from EMD [19]. The EMD can effectively process non-linear and non-stationary time-series signals, such as ECG signals. Unlike FFT and discrete wavelet transform (DWT) [24], the EMD reveals the inherent features of a signal through its decomposition IMFs. It can represent a signal as a combination of multiple IMFs components, containing the characteristic distribution from high to low frequency. Different IMFs can reflect the feature information of signal and noise in different degrees.

In general, some MA noise has similar morphology and overlapping frequency to some ECG signals, so traditional denoising methods cannot effectively eliminate such noise. Amazingly, we find the local nuances between them that can be expressed by IMFs somehow. Therefore, we design a special method to obtain the DLDV features of these MA-contaminated ECGs. Figure 2 shows the architecture diagram of the proposed DLDV feature extraction method. The light green areas represent the key modules of the proposed method, which we named the DLDV feature extraction module (DLDV-FEM), and it composed of a stack of $N = 3$ identical modules. Each module has two sub-modules. The first is an FFT-based sub-module, and the second is a statistical analysis-based sub-module (SA-based sub-module). After performing the EMD operation on x[n], we obtain its FT-IMF components (F-IMF: the first IMF, S-IMF: the second IMF, and T-IMF: the third IMF). When we feed F-IMF to DLDV-FEM through the “Input” pipeline, the FFT-based sub-module obtains its amplitude value and phase angle in the frequency domain through the FFT [20] operation (denoted as FT-IMF$_\mathrm{f}$). Meanwhile, the SA-based sub-module obtains its central tendency and degree of dispersion (denoted as FT-IMF$_\mathrm{t}$). Then, FT-IMF$_\mathrm{t}$ and FT-IMF$_\mathrm{f}$ are output together to FT-IMF$_\mathrm{F}$ through the lavender pipeline. When the remaining S-IMF and T-IMF pass through the DLDV-FEM module in turn, we get two output components (S-IMF$_\mathrm{S}$, and T-IMF$_\mathrm{T}$). Then, the FT-IMF$_\mathrm{f}$ of these three output components are concatenated together to form our FT-IMF$_\mathrm{freq}$ (DLDV) features, and the three FT-IMF$_\mathrm{t}$ of these components are concatenated together to form our FT-IMF$_\mathrm{time}$ features. Finally, the output features (FT-IMF$_\mathrm{all}$) of the entire module are obtained by concatenating FT-IMF$_\mathrm{freq}$ and FT-IMF$_\mathrm{time}$. Next, we will describe the feature extraction process in detail:

Given $X \in {\mathbb {R}}^{12 \times \ell }$ represents a multi-lead ECG signal, and $X_\mathrm{f} \in {\mathbb {R}}^{1 \times \ell }$ represents the f-th lead ECG signal, $f \in [1, \ldots , 12]$ are the number of leads for the ECG signal, and l is the length of ECG segment. After performing the EMD operation according to [19], we can get IMFs as follows:

$$\begin{aligned} \mathrm{I M F}_{\mathrm{f}, p}[n] = \left\{ \begin{array}{l} X_{\mathrm{f}}[n]-r_{\mathrm{f}, p}[n], p = 1 \\ X_{\mathrm{f}}[n]-\sum _{p = 2}^{N} \mathrm{I M F}_{\mathrm{f}, p-1}[n]-r_{\mathrm{f}, p}[n], p>1, \end{array}\right. \end{aligned}$$

(1)

where n is the serial number of ECG segment, $\mathrm{IMF}_{\mathrm{f},p}[n]$ represents the p-th IMF of the f-th lead. $\mathrm {p} \in [1,2\ldots ,N]$, N (here, the value of N is 3 and f is 1) is the total layer number of IMFs, $r_{f,p}[n]$ is the residual signal generated by the f-th lead signal passing through the p-th layer EMD. Note that this paper mainly uses the FT-IMF (F-IMF, S-IMF, and T-IMF) components of EMD. Because the dynamics of the FT-IMF of the EMD are as though they have been passed through a high-pass filter [25]. Hence, it is not surprising that the FT-IMF contains dynamics associated with noise for any well-sampled data [26].

Figure 3 shows the FT-IMF of clean signal, bw-contaminated signal, ma-contaminated signal and em-contaminated signal, respectively. We find several interesting phenomena: (1) the amplitude values of the IMFs of the noise-contaminated ECG signals are significantly lower than that of the clean signals. (2) The FT-IMF component of EMD contains almost no bw noise (there is almost no difference between the corresponding IMFs components in Fig. 3a, b), but can well reflect the inherent features of em and ma noise (the FT-IMF of the noise signal in Fig. 3c, d) reflect the feature information of noise to varying degrees). (3) R peaks have higher amplitude values in each IMF component, while em or ma noises similar to R peaks have different amplitude values in different IMFs. The difference of ma artifacts in each IMF component is marked in light purple in Fig. 3c, and it can be seen that the ma is manifest in different degrees in all three components. In Fig. 3d, the difference of em artifacts in each IMF component is marked in light purple colors, and it can be seen that em has obvious characteristics in T-IMF. These phenomena indicate that the FT-IMF contains some features beneficial to recognizing MA-contaminated ECG. Therefore, we utilize the FFT-based sub-module to extract the amplitude value and phase angle of F-IMF, S-IMF and T-IMF in the frequency domain, and concatenate the features obtained from the three components:

$$\begin{aligned} {\text {FT-IMF}}_{\mathrm{freq}} = \mathrm{Concat}\left\{ \begin{array}{l} \mathrm{angle}(fft({\text {F-IMF}})),\Vert fft({\text {F-IMF}})\Vert \\ \mathrm{angle}(fft({\text {S-IMF}})),\Vert fft({\text {S-IMF}})\Vert \\ \mathrm{angle}(fft({\text {T-IMF}})),\Vert fft({\text {T-IMF}})\Vert \end{array}\right\} ,\nonumber \\ \end{aligned}$$

(2)

among them, FT-IMF$_\mathrm{freq} \in {\mathbb {R}}^{3 \times 2l}$, the $\Vert \cdot \Vert $ means the absolute value operation, the ${\text {angle}}(\cdot )$ represents the operation of calculating the phase angle, and ${\text {fft}}(\cdot )$ represents the operation of FFT. ${\text {Concat}}(\cdot )$ represents the operation of the connection. Simultaneously, we utilize the SA-based sub-module to analyze the central tendency and dispersion degree of FT-IMF in the time domain, and concatenate the features obtained from the three components:

$$\begin{aligned} {\text {FT-IMF}}_\mathrm{time } = \mathrm{Concat}\left\{ \begin{array}{l} \mathrm{mean}({\text {F-IMF}}), \mathrm{var}({\text {F-IMF}}) \\ \mathrm{mean}({\text {S-IMF}}), \mathrm{var}({\text {S-IMF}}) \\ \mathrm{mean}({\text {T-IMF}}), \mathrm{var}({\text {T-IMF}}) \end{array}\right\} , \end{aligned}$$

(3)

where the ${\text {mean}}(\cdot )$ is the averaging operation, ${\text {var}}(\cdot )$ represents the operation of calculating variance, and FT-IMF$_\mathrm{time} \in {\mathbb {R}}^{3 \times 2}$.

Figure 4 shows an example of the feature extraction of the em and ma contaminated signals at each stage. Figure 4a is the amplitude–frequency features of the em-contaminated ECG, and Fig. 4b is its corresponding phase angle features. Figure 4c is the amplitude–frequency features of the ma-contaminated ECG, and Fig. 4d is its corresponding phase angle features. It can be seen that when the frequency of the intermediate quantity decomposed by the em- or ma-contaminated ECG is not 0, the corresponding phase angle is also not 0 and does not have obvious periodic characteristics (the phase angle feature of the clean signal has a periodic characteristic.). It is in line with the periodic characteristics of the ECG signal. In addition, the phase angle can reflect the local change of the signal waveform at a certain moment [27], so the depth features extracted in this way can well remember the subtle differences between the signal and noise. Finally, we obtain the FT-IMF$_\mathrm{freq}$ and FT-IMF$_\mathrm{time}$, and we also call the FT-IMF$_\mathrm{freq}$ as $X_\mathrm{DLDV}$.

DLDV feature dimension reduction

Principal component analysis (PCA) [28] is one of the essential methods for linear dimensionality reduction. Each principal component is a data projection in a certain direction, and their variances in different directions are determined by their eigenvalue. In the dimensionality reduction process, the eigenvalues are sorted from large to small. The eigenvectors corresponding to the first k eigenvalues are used as dimensionality-reduced features to express the information we are interested in. However, the data we need to process are nonlinear and non-stationary ECG signals. Therefore, this paper adopts kernel principal component analysis (KPCA) [21] to deal with these data. In the KPCA, we believe the ECG data have a higher dimension. We can do PCA analysis in a higher-dimensional space (Hilbert space). The advantage is that it is possible to find an effective projection direction to classify the data in a higher-dimensional space for nonlinear data points that are difficult to classify in a lower-dimensional space. Since the dimensionality of DLDV features (non-linear features) is too high and contains some features that hardly contribute to classification (as reflected in Fig. 4). So, we utilize KPCA to perform dimensionality reduction operations on DLDV features.

For PCA, given $\mathbf {{\textbf {X}}}_\mathrm{DLDV} = \left[ x_{1}, x_{2}, \ldots , x_{n}\right] , \varvec{{\textbf {X}}}_\mathrm{DLDV} \in {\mathbb {R}}^{n \times d}$, n is the sequence numbers of ${\textbf {X}}_\mathrm{DLDV}$, and d is the dimension of each sequence. After performing PCA, we get the following decomposition model:

$$\begin{aligned} \mathbf {{\textbf {X}}}_\mathrm{DLDV} = \varvec{{\textbf {S}}}_{1} \varvec{{\textbf {U}}}_{1}^{T}+\varvec{{\textbf {S}}}_{2} \varvec{{\textbf {U}}}_{2}^{T}+\cdots \varvec{{\textbf {S}}}_{d} \varvec{{\textbf {U}}}_{d}^{T}, \end{aligned}$$

(4)

$\mathbf {{\textbf {S}}}_{\mathrm {t}}(1 \le \mathrm {t} \le \mathrm {d})$ and $\mathbf {{\textbf {U}}}_{\mathrm {t}}(1 \le \mathrm {t} \le \mathrm {d})$ represents the principal component vector and the corresponding projection vector, respectively. Since ${\textbf {U}}_{t}$ represents a series of orthonormalized vectors, the principal component ${\textbf {S}}_{t}$ can be expressed as: ${\textbf {S}}_{t} = {\textbf {X}}_\mathrm{DLDV} {\textbf {U}}_{t}$. So, the projection vector ${\textbf {U}}_{t}$ can be calculated by solving the eigenvalue problem:

$$\begin{aligned} \gamma _{\mathrm {t}} \varvec{{\textbf {U}}}_{t} = \frac{1}{n-1} \varvec{{\textbf {X}}}_\mathrm{DLDV}^{T} \varvec{{\textbf {X}}}_\mathrm{DLDV} \varvec{{\textbf {U}}}_{t}. \end{aligned}$$

(5)

For KPCA, we define a mapping: ${\textbf {X}} _\mathrm{DLDV} \in {\mathbb {R}}^{n \times d} \rightarrow \varvec{\mathbb {\aleph }}\left( \mathrm {{\textbf {X}}}_\mathrm{DLDV}\right) \in {\mathbb {R}}^{n \times p}$, the $\varvec{\mathbb {\aleph }}(\cdot )$ denotes a nonlinear mapping function which is to map the signal to the Hilbert functional space ($\varvec{\beth }$), and p represents the dimension of the feature space. We denote the mapping function of ${\textbf {X}}_\mathrm{DLDV}$ to the $\varvec{\beth }$ space as:

$$\begin{aligned} \varvec{\mathbb {\aleph }}\left( {\varvec{X}}_\mathrm{DLDV}\right) = {\varvec{S}}_{1} {\varvec{U}}_{1}^{T}+{\varvec{S}}_{2} {\varvec{U}}_{2}^{T}+\cdots {\varvec{S}}_{p} {\varvec{U}}_{p}^{T}. \end{aligned}$$

(6)

For the nonlinear case, it is difficult to solve ${\textbf {U}}_{t}$ by simply replacing ${\textbf {X}}_\mathrm{DLDV}$ with $\varvec{\mathbb {\aleph }}({\textbf {X}}_\mathrm{DLDV} )$ according to (6). Because the mapping function $\varvec{\mathbb {\aleph }}(\cdot )$ is unknown. To address this problem, we introduce kernel tricks to develop KPCA model. The ${\varvec{U}}_{t}$ can be expanded in the feature space as $\mathrm {{\textbf {U}}}_{\mathrm {t}} = \varvec{\mathbb {\aleph }}^{T}\left( {\varvec{X}}_\mathrm{D L D V}\right) \varvec{\beta }_{t}$ by reference [29], $\varvec{\beta }_{t}$ is a linear transformation vector. Thus, formula (6) is transformed as:

$$\begin{aligned} \gamma _{t} \varvec{\beta }_{t} = \frac{1}{n-1}\left( \varvec{\mathbb {\aleph }}\left( {\varvec{X}}_\mathrm{D L D V}\right) \varvec{\mathbb {\aleph }}^{T}\left( \varvec{{\varvec{X}}}_\mathrm{D L D V}\right) \right) \varvec{\beta }_{t}, \end{aligned}$$

(7)

we find that $K = \varvec{\mathbb {\aleph }}({\textbf {X}}_\mathrm{DLDV} ) \varvec{\mathbb {\aleph }}^{T} ({\textbf {X}}_\mathrm{DLDV} )$ is the kernel matrix of the kernel function, and the elements of the kernel matrix are calculated by the Gaussian kernel function $k(x, y) = e^{-\left( \frac{\left\| x^{2}-y^{2}\right\| }{w}\right) }$, and w represents the bandwidth of the Gaussian kernel.

For a given test vector ${\varvec{X}}_\mathrm{D L D V}^{j} \in {\mathbb {R}}^{d}$, represents the j-th DLDV feature vector, the corresponding kernel principal component can be calculated by [30‐32]:

$$\begin{aligned} {\varvec{S}}_{\mathrm {t}}\left( {\varvec{X}}_\mathrm{D L D V}^{j}\right)= & {} \varvec{\mathbb {\aleph }}\left( {\varvec{X}}_\mathrm{D L D V}^{j}\right) \varvec{\mathbb {\aleph }}^{T}\left( {\varvec{X}}_\mathrm{D L D V}\right) \varvec{\beta _{t}} \nonumber \\= & {} k\left( {\varvec{X}}_\mathrm{D L D V}^{j}, {\varvec{X}}_\mathrm{D L D V}\right) \varvec{\beta _{t}}, \end{aligned}$$

(8)

where $t = [1,2,..., k]$ indicates that the first k vectors retained after dimensionality reduction, that is $\mathrm {{\varvec{S}}_{t}}\left( {\varvec{X}}_\mathrm{D L D V}\right) \in {\mathbb {R}}^{1 \times k}$. Here, we determine the value of k by the cumulative contribution rate of the principal components. Usually, if the cumulative contribution rate (P) of the first k principal components reaches 80–90%, it means that the first k principal components basically contain the main information of all measurement indicators. To keep as many principal components as possible while reducing dimensionality as much as possible, we keep all principal components with $P\ge 95\%$:

$$\begin{aligned} P = \frac{\sum _{i = 1}^{k} \gamma _{k}}{\sum _{i = 1}^{d} \gamma _{k}}. \end{aligned}$$

(9)

After DLDV feature extraction and dimensionality reduction for 6 s ECG signals, we determine the minimum k value that satisfy Eq. (10) is $k = 2124$ ($354\times 6$). Finally, we combine the FT-IMF$_\mathrm{time}$, and the low-dimensional result (FT-IMF$_\mathrm{all} \in {\mathbb {R}}^{1 \times (k+6)}$) obtained as:

$$\begin{aligned} {\text {FT-IMF}}_\mathrm{all} = \left[ {\text {FT-IMF}}_\mathrm{time},{\textbf {S}}_{t}\left( {\textbf {X}}_\mathrm{DLDV}\right) \right] . \end{aligned}$$

(10)

Proposed dual-input transformer model

Deep learning-based approaches can automatically extract abstract features of samples. However, its complex convolution and recursive structure make a series of hidden layers have a large number of front-to-back dependencies, which leads to low parallelism of the model. Transformer, the first sequence transduction model entirely based on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention [23]. Existing studies have shown that the transformer can not only handle the problem in the field of translation, but can deal with the classification of temporal sequence [23], such as ECG sequence [33, 34]. For the first time, we propose a DI-Transformer model to deal with the problem of ECG SQA, and its overall structure is shown in Fig. 5. Our DI-Transformer model mainly includes the transformer encoder layer and classifier layer. Furthermore, the feature extraction and KPCA are plugged into our model as augmented features. Note that the transformer encoder layer is formed by stacking six attention modules, each module includes six multi-head attention blocks, and the specific composition of the multi-head attention mechanism is in [23, 35, 36]. Since ECG does not require a standard translation process, we replace the decoder part of the transformer with a fully connected layer. We describe the DI-Transformer in detail as follows.

Transformer encoder layer

Input embedding and positional encoding: The input embedding of the sequential signal is similar to methods in most natural language processing (NLP) architectures [37]. To get the embedding for each point, the Raw ECG or FT-IMF$_\mathrm{all}$ is mapped to the $d_\mathrm{model}$ dimensional space through 1D convolution. It should be noted that we must ensure the consistency of the sequence length before and after convolution through well-designed padding and kernel size. That is, we must ensure the dimension of the embedding output is also $d_\mathrm{model}$. In addition, we choose the sinusoidal version [23, 36] to provide positional embedding for our input sequence.

Attention module: We stack the attention module six times, and each consisting of two parts (the multi-head attention block and the feed forward network). The former comprises six parallel attention modules, and its internal structure is shown in Fig. 6. After the “input embedding and positional encoding” operation for raw ECG, the input vector U of the transformer encoder layer is obtained. Then, we define three transformation matrices: $\mathrm {W}_{e}^{\mathrm {Q}}\in {\mathbb {R}}^{d_\mathrm{model}\times d_{k}}$, $\mathrm {W}_{e}^{\mathrm {K}}\in {\mathbb {R}}^{d_\mathrm{model}\times d_{k}}$ and $\mathrm {W}_{e}^{\mathrm {V}}\in {\mathbb {R}}^{d_\mathrm{model}\times d_{v}}$, $e = \{1,2,\ldots ,6\}$, and use these three transformation matrices to perform three linear transformations on U to get the query ($Q_{e}$), Key ($K_{e}$) and Value ($V_{e}$). Finally, the e-th head is calculated by $Q_{e}$, $K_{e}$, and $V_{e}$:

$$\begin{aligned} h_{\mathrm {e}} = {\text {softmax}}\left( \frac{Q_{e}\cdot K_{e}^\mathrm{T}}{\sqrt{d_{k}}}\right) \mathrm {V}_{\mathrm {e}}, \end{aligned}$$

(11)

where T represents the operation of matrix transpose. To connect the results of all $h_{e}$, we define the transformation matrix $W^{P}$, and then get the output of the multi-head attention module through a linear mapping operation:

$$\begin{aligned} \mathrm{MHAB}\left( Q,K,V) = \mathrm{Concat}(h_{1},h_{2},\ldots ,h_{6}\right) W^{P}, \end{aligned}$$

(12)

where the $W^{P}\in {\mathbb {R}}^{6 d_{v}\times d_\mathrm{model}}$ [23]. And then, a residual connection and a layer normalization are performed in “Add &Norm” blocks for MHAB(Q, K, V). The result is then connected to the feed-forward network (the second part of attention module), which consists of two fully connected layers with a rectified linear unit (ReLU). The output of each attention module is represented as $X_\mathrm{attention}$. Note that we use layer normalization rather than batch normalization. Again, a residual connection, layer normalization and feed forward are performed, respectively. We can finally get the output of the transformer encoder layer. The output will be used as the input of the next transformer encoder layer or fusion with FT-IMF$_\mathrm{all}$ and input to the classification layer to determine the final output categories.

Dual-input features fusion and classification

In the phase of model initialization, we extract FT-IMF$_\mathrm{time}$ and FT-IMF$_\mathrm{freq}$ features through the proposed method and perform KPCA on FT-IMF$_\mathrm{freq}$. They are then concatenated and used as the second channel input feature (FT-IMF$_\mathrm{all}$) of DI-Transformer. In the training phase, the Raw ECG of the first channel is divided into mini-batch and perform position encoding and then feed into the transformer encoder layer. For each iteration, we randomly select 6 s data from each Raw ECG sample (We have shown in follow-up experiments that 6 s long data is optimal). After the Raw ECG passes through the transformer encoder layer, the extracted feature map is flattened and concatenated with the FT-IMF$_\mathrm{all}$ features prepared in the phase of model initialization:

$$\begin{aligned} X_\mathrm{hidden} = \left[ \mathrm{concat}\left( X_\mathrm{attention}^{1}, \ldots X_\mathrm{attention }^{6}\right) , {\text {FT-IMF}}_\mathrm{all}\right] . \end{aligned}$$

(13)

And then the $X_\mathrm{hidden}$ goes through a linear layer (a 1D fully connected layer and the input dimension is $d_\mathrm{in}$), which is connected with a softmax function. Then, the Softmax mapping scores are compared with the corresponding input labels to calculate the cross-entropy loss value. Finally, the classification layer outputs a vector $V = (v_{1},v_{2})$, where $v_{i}$ denotes the probability that the segment belongs to class i (good quality or bad quality).

Experiments and results

ECG database and experimental setting

ECG database

This paper employs the Physionet Computing in Cardiology Challenge 2011 (PCCC) [38] database to test the proposed SQA method. The PCCC includes 1500 10 s standard 12-lead ECG recordings with sampling rate 500 Hz, and it contains two subsets: the set-a includes 1000 12-lead 10 s recordings, and the set-b includes 500 12-lead 10 s recordings. This paper employs set-a, which contains 9276 ($773\times 12$) 10 s good quality (“acceptable”) ECGs and 2700 ($225\times 12$) 10 s bad quality (“unacceptable”) ECGs. In addition, we also select 500 single-lead good quality records and 500 single-lead bad quality records from the PCCC to form the testset (test-a). Then, we randomly select the em or ma noise after oversampling and use it to contaminate any one of the 500 selected good quality data according to the method in [39], repeat this process 500 times, and generate 500 records with em and ma noise contamination. Finally, the generated 500 bad quality data and 500 good quality data selected from PCCC are combined into a testset (test-b). The details of each database are described in Table 1. As shown in Fig. 7, we randomly select the good quality and bad quality segments from the set-a. In addition, it should be noted that the Z-score is used to normalize each 10 s record of all datasets, which can be calculated as follows:

Table 1

Details of the datasets used in this paper

Data	Good quality	Bad quality	Total
Set-a	9276	2700	11,976
Test-a	500	500	1000
Test-b	500	500	1000

$$\begin{aligned} {z} = \frac{x-u}{\sigma }, \end{aligned}$$

(14)

where x denotes the signal segments, $\mu $ and $\sigma $ are the mean value and standard deviation of the signal segments, respectively.

Experimental setting

Model parameters settings: The key parameters set for the DI-Transformer model are shown in Table 2. It should be noted that due to the physiological characteristics of the human body, ECG signal strength will be limited within a certain range, which means there will not be much numerical difference between peaks and troughs, so the $d_\mathrm{model}$ is set to 512 [33]. In addition, to achieve the goal of rapid convergence and prevent oscillation near the local minimum, the learning rate is dynamically adjusted during the model’s training.

The whole method is developed and trained using Tensorflow and Pytorch. Our experiments are performed on a computer with an Intel(R) Core(TM) i5-7640X CPU@4.00GHz, and equipped with two GPU GeForce GTX 1080 Ti with 11GB RAM.

Performance evaluation: To evaluate the performance of the proposed method for SQA, we adopted five-fold cross-validation. The set-a is randomly divided into five equal subsets, each subset is selected as the test set in turn, and the remaining four subsets are used for training. However, less than a quarter of the data is classified as bad quality. It is well known that using an unbalanced dataset to build classifiers will cause bias and result in poor generalization ability of classification models. Another approach is to balance the dataset when not using prior probabilities (and Bayesian training paradigms) to overcome this problem. Therefore, we balance the dataset by adding real noise [em and ma noise from NSTDB [40] and additive Gaussian white noise (AGWN)] to the good quality segments to generate additional bad quality data. Note that we oversampled the em and ma noises to 500 Hz before adding them to the training subset, and the sampling rate of AGWN is also of 500 Hz. The method of balancing the dataset is described in [39]. For each cross-validation task, we balance train subset (containing $7421\approx 9276/5 * 4$ 10 s good quality segments and $6838\approx 2700/5*4+4678$ 10 s bad quality segments) but keep the test subset unchanged (containing $1855\approx 9276/5$ 10 s good quality segments and $540\approx 2700/5$ 10 s bad quality segments).

In addition, we employ multiple indicators to evaluate the performance of the proposed method, such as sensitivity (Se), Specificity (Sp), Precision ($P_{+}$), accuracy (Acc), $F_{1}$ and area under curve (AUC) [41]. It should be noted that for extremely unbalanced data (i.e., a low prevalence or incidence of a disease in the total population), the ROC curve and AUC are only partially meaningful. For this problem, Carrington et al. [42] gives an effective solution. Here, we balanced the training set. The definitions of these indicators are as follows:

$$\begin{aligned} {\text {Se}}(\%)= & {} \frac{\mathrm{T P}}{\mathrm{T P}+\mathrm{F N}} \times 100 \%, \end{aligned}$$

(15)

$$\begin{aligned} {\text {Sp}}(\%)= & {} \frac{\mathrm{TN}}{\mathrm{TN} +\mathrm{FP}} \times 100 \%, \end{aligned}$$

(16)

$$\begin{aligned} \mathrm {P}_{+}(\%)= & {} \frac{\mathrm{T P}}{\mathrm{T P}+\mathrm{F P}} \times 100 \%, \end{aligned}$$

(17)

$$\begin{aligned} {\text {Acc}}(\%)= & {} \frac{\mathrm{T P}+\mathrm{T N}}{\mathrm{T P}+\mathrm{T N}+\mathrm{F P}+\mathrm{F N}} \times 100 \%, \end{aligned}$$

(18)

$$\begin{aligned} \mathrm {F}_{\text{1 }}(\%)= & {} \frac{2 P_{+} \times \mathrm{S e}}{P_{+}+\mathrm{S e}} \times 100 \%, \end{aligned}$$

(19)

where TP is true positives, TN is true negatives, FP is false positives and FN is false negatives.

Table 2

The parameter setting of the proposed DI-Transformer model

Parameters	Value	Notes
Batch_size	32	The number of samples fed into the model each time
${d}_{\text {model}}$	512	Input size of transformer encoder layer
${d}_{\text {in}}$	2648	The input dimension of the linear layer
Num_module	6	The number of attention module
Num_heads	6	Number of heads in each multi-head module
$N_\mathrm{seg}$	6 s	The length of each ECG segment
Dropout_ratio	0.1	The proportion of neurons randomly discarded during the training phase
Optimizer	–	The Adam optimizer
Learning rate	0.01	The initial value was 0.01 and randomly changed every five epochs from 0.01 to 0.001
k	2124	Dimensions that satisfy the condition ($P\ge 95\%$) after KPCA

Experiments results

Table 3

The main parameter settings of each model in the Python development environment

Model	Hyperparameter	Model	Hyperparameter
G-SVM	Parameter:25	RF	Criterion: ‘Gini’
	Gamma: scale		n_estimators = 300
	Kernel: ‘rbf’		Random_state = 0
	class_weight: balanced		n_jobs = 2
LR	Penalty: ‘l2’	KNN	n_neighbors = 5
	Solver: liblinear		Weights = ‘uniform’
	Tol: default		Algorithm = ‘auto’
	max_iter:1000		leaf_size = 30
	class_weight:balanced		Metric = ‘minkowski’

Performance evaluation of DLDV features

To evaluate the performance of the DLDV features extracted by our method, we employ four traditional classifiers (Gaussian Kernel Support Vector Machines (G-SVM) [43], Logistic Regression (LR) [44], Random Forests (RF) [45], and K-Nearest Neighbors (KNN) [46], and the parameter settings of each classifier are shown in Table 3) and six time-frequency dependent SQIs [10, 47, 48], such as sSQI and kSQI, pSQI, LpSQI, MpSQI, HpSQI. Table 4 shows the binary classification results of ECG signal quality using a series of features on four traditional classifiers. Figure 8 shows the confusion matrix obtained from the DLDV features (FT-IMF$_\mathrm{freq}$) on the four classifiers. Table 4 shows that our DLDV features outperform the traditional six SQIs on G-SVM, LR, RF and KNN. Our DLDV features achieve the best performance on G-SVM, and the Se, $P_{+}$ and Acc achieve 93.42, 97.85 and 93.32%, respectively. Among the six comparison SQIs, the sSQI achieve the best performance on KNN with Se, $P_{+}$ and Acc are 89.91, 93.27 and 87.92%, respectively. Despite this, its Acc is still 5.40% lower than our method. Such results show that the performance of the DLDV features outperform the six comparison SQIs.

Table 4

Average results of five-fold cross-validation performed on set-a using four classifiers for each SQIs

SQIs	G-SVM			LR			RF			KNN
SQIs	Se	$P_{+}$	Acc	Se	$P_{+}$	Acc	Se	$P_{+}$	Acc	Se	$P_{+}$	Acc
sSQI [47]	86.19	87.31	84.19	78.36	84.43	78.36	88.42	92.14	86.27	89.91	93.27	87.92
kSQI [47]	85.73	86.77	83.73	70.68	79.36	76.69	86.73	90.81	86.11	87.28	91.54	86.28
pSQI [10]	85.99	87.29	84.00	84.89	85.81	82.50	85.63	89.21	85.22	86.87	90.32	82.23
LpSQI [48]	84.44	86.74	83.79	78.74	83.13	76.22	83.76	87.00	83.34	80.55	89.45	83.26
MpSQI [48]	84.76	86.65	84.21	81.49	85.12	79.14	85.49	88.83	84.86	85.47	90.73	84.02
HpSQI [48]	83.43	88.75	83.44	78.74	86.83	78.61	84.45	87.89	83.16	82.45	89.74	84.77
FT-IMF$_\mathrm{time}$	82.57	89.41	85.99	80.68	85.52	81.72	85.33	86.95	84.25	84.29	90.97	83.05
FT-IMF$_\mathrm{freq}$	93.42	97.85	93.32	88.52	95.35	87.76	88.25	97.69	92.73	93.20	97.63	92.98

Notice the bold indicates that FT-IMF_freq achieves the best performance compared to other SQIs. Note that DLDV and $FT-IMF_\mathrm{freq}$ represent the same feature

To further test the performance of the proposed method, instead of randomly combining SQIs to train the classification model, we generate new combinations of SQIs according to the principle of decreasing the average accuracy of the six SQIs on the four classifiers. Then, these combinations are compared with DLDV, FT-IMF$_\mathrm{all}$, respectively, and the results on each classifier are shown in Table 5. It can be seen that the Acc of the combination of six SQIs is the highest among all combinations, but still lower than the Acc of DLDV and FT-IMF$_\mathrm{all}$. It shows that our features’ performance is better than the traditional six advanced SQIs. Furthermore, our DLDV feature performs the best on G-SVM (Acc = 93.32%), which benefits from our DLDV features and the superior performance of the SVM classifier based on the Gaussian kernel function. The results obtained on KNN (Acc = 92.98%) are slightly inferior to G-SVM. In addition, our features perform poorly on LR (Acc = 87.76%), even lower than SQI$_\mathrm{features}$ on KNN (Acc = 89.98%), but still slightly ahead of the results for the combinations of all 6 SQIs. It indicates that our method outperforms these six traditional SQIs in executing quality classification.

Comparison of our DI-Transformer and four traditional classifiers

This section compares our DI-Transformer with four traditional methods (G-SVM, LR, RF and KNN). Four features (SQI$_\mathrm{features}$, FT-IMF$_\mathrm{time}$, FT-IMF$_\mathrm{freq}$ and FT-IMF$_\mathrm{all}$) are used to build five categories of classifiers, and the results on the test set are shown in Table 6. It can be seen that the classification models built with SQI$_\mathrm{features}$, a higher accuracy (Acc = 89.98%) is achieved on KNN among all four traditional models, but still lower than the result of DI-Transformer (Acc = 91.26%). The performance of the classification models built with FT-IMF$_\mathrm{all}$ is generally better than that of SQI$_\mathrm{features}$. The result on G-SVM (Acc = 94.27%) is better than that obtained on KNN (Acc = 93.64%), but Table 7 and Fig. 13b reflect that the performance on KNN (AUC = 0.962) is better than G-SVM (AUC = 0.921). More importantly, combined with FT-IMF$_\mathrm{all}$, our DI-Transformer achieves the globally best performance (Acc = 99.62% and AUC = 0.993). The p values we provide in Table 8 show the significant difference in expression signal quality between the proposed DI-Transformer and these four traditional classifiers, and this significant difference is statistically significant.

Table 5

Average Acc of fivefold cross-validation performed on balanced set-a using four classifiers for the different combinations of SQIs

Combination of SQIs	G-SVM	LR	RF	KNN
sSQI, kSQI	84.77	83.62	86.96	87.93
sSQI, kSQI, pSQI	85.67	82.32	87.01	88.02
sSQI, kSQI, pSQI, MpSQI	86.93	82.21	88.77	89.06
sSQI, kSQI, pSQI, MpSQI, HpSQI	87.52	82.93	88.86	89.42
sSQI, kSQI, pSQI, MpSQI, HpSQI, LpSQI (SQI$_\mathrm{features}$)	88.64	83.74	89.62	89.98
FT-IMF$_\mathrm{freq}(\mathrm DLDV)$	93.32	87.76	92.73	92.98
FT-IMF$_\mathrm{freq}$ + FT-IMF$_\mathrm{time}$ (FT-IMF$_\mathrm{all}$)	94.27	91.42	93.42	93.64

Notice the bold indicates that compared with other combinations of SQIs, FT-IMF_all has the best performance on G-SVM, followed by KNN. Note that DLDV and FT-IMF_freq represent the same feature

Ablation study on DI-Transformer model

In this section, we design a series of ablation experiments to comprehensively evaluate the performance of the proposed DI-Transformer. Experiment A only uses the FT-IMF$_\mathrm{freq}$ feature as the input to train the transformer-based model. Based on experiment A, the B used the FT-IMF$_\mathrm{freq}$ and FT-IMF$_\mathrm{time}$ as the input to train the transformer-based model. Experiment C only used Raw ECG as the input to train the transformer model. Based on C, experiment D treats FT-IMF$_\mathrm{time}$ as augmented features, which are then concatenated with the output of the transformer encoder layer and fed to the classification layer. Experiment E encodes the Raw ECG as the input of the transformer, and then the dimension reduced FT-IMF$_\mathrm{freq}$ is used as an augmented feature, which is finally fed into the classification layer along with the output of the transformer encoder layer (see in Fig. 5). On the basis of experiment E, the F treats FT-IMF$_\mathrm{freq}$ and FT-IMF$_\mathrm{time}$ as augmented features, which are then concatenated with the output of the transformer encoder layer and fed to the classification layer. Notice that compared with experiments A, B and C for the single-input structure, experiments D, E, and F adopt the method of augments feature with a dual-input structure, the most advantage of which is that it can fully utilize the depth local dual-view features.

Table 6

The Acc values obtained by different features on five classifiers

Features	G-SVM	LR	RF	KNN	DI-Transformer
SQI$_\mathrm{features}$	88.64	83.74	89.62	89.98	91.26
FT-IMF$_\mathrm{time}$	85.99	81.72	84.25	83.05	93.74
FT-IMF$_\mathrm{freq}$	93.32	87.76	92.73	92.98	98.58
FT-IMF$_\mathrm{all}$	94.27	91.42	93.42	93.64	99.62

Notice the bold indicates that several classes of features achieve the best performance on the classifier built by DI-Transformer

Table 7

The AUC values obtained by different features on five classifiers

Features	G-SVM	LR	RF	KNN	DI-Transformer
SQI$_\mathrm{features}$	0.920	0.780	0.925	0.929	0.932
FT-IMF$_\mathrm{time}$	0.905	0.771	0.885	0.902	0.904
FT-IMF$_\mathrm{freq}$	0.914	0.902	0.936	0.959	0.985
FT-IMF$_\mathrm{all}$	0.921	0.919	0.948	0.962	0.993

Notice bold indicates that several classes of features achieve the highest AUC on the classifier built by DI-Transformer compare to other four classifiers

Table 8

The p values of AUC between DI-Transformer and traditional methods by using different features

Features	G-SVM	LR	RF	KNN
SQI$_\mathrm{features}$	1.51e${-}3{*}$	1.43e${-}7{*}$	1.60e${-}6{*}$	4.93e${-}3{*}$
FT-IMF$_\mathrm{time}$	0.8971	4.04e${-}6{*}$	2.35e${-}5{*}$	2.76e${-}4{*}$
FT-IMF$_\mathrm{freq}$	5.03e${-}4{*}$	7.93e${-}3{*}$	5.68e${-}6{*}$	4.84e${-}3{*}$
FT-IMF$_\mathrm{all}$	6.11e${-}3{*}$	8.15e${-}3{*}$	1.85e${-}4{*}$	1.11e${-}3{*}$

*Significance at 0.05 level

Table 9

Results of ablation studies performed on DI-Transformer with five-fold cross-validation

No.	Input features	Input mode	Se	Sp	$P_{+}$	Acc	$F_{1}$
A	FT-IMF$_\mathrm{freq}$	Single-Input	94.66	98.33	99.49	95.49	97.02
B	FT-IMF$_\mathrm{freq}$+FT-IMF$_\mathrm{time}$	Single-Input	97.78	97.41	99.23	97.70	98.51
C	Raw ECG	Single-Input	90.89	98.33	99.47	92.57	94.99
D	Raw ECG, FT-IMF$_\mathrm{time}$	Dual-Input	92.51	97.96	99.36	93.74	95.81
E	Raw ECG, FT-IMF$_\mathrm{freq}$	Dual-Input	98.38	99.26	99.78	98.58	99.08
F	Raw ECG, FT-IMF$_\mathrm{freq}$ + FT-IMF$_\mathrm{time}$	Dual-Input	99.68	99.44	99.83	99.62	99.76

Notice the bold indicates that among various ablation experiments, the F achieves the best performance

Table 9 shows a series of ablation experiments associated with the proposed method, and Fig. 9 shows six confusion matrices for the corresponding experiments. As shown in Table 9, the Acc of the transformer-based model achieves 95.49% in experiment A. the Acc of experiment C achieves 92.57%. Compared with experiment C, the Acc of experiment E (DI-Transformer model) is increased by 6.01%. The result shows that as an augmented feature, the FT-IMF$_\mathrm{freq}$ significantly improves the performance of the model. Comparing the results of experiments A and C, we can find that inputs FT-IMF$_\mathrm{freq}$ into transformer can more effectively improve the classification performance than directly inputs Raw ECG into transformer. In experiment B, the Acc of the transformer-based model achieves 97.70% and the $F_{1}$ achieves 98.51%. Comparing the results of experiments A and B, it can be seen that as an augmented feature the FT-IMF$_\mathrm{time}$ also improves the classification performance of the model, but its contribution is not as significant as FT-IMF$_\mathrm{freq}$. Experiment F maximizes the performance of the proposed DI-Transformer method, its Se, Sp, $P_{+}$ and Acc values reaches 99.68, 99.44, 99.83 and 99.62%, respectively. As shown in Fig. 9f, only 0.25% of the good quality data are misclassified as bad quality data. Such results show that the performance of our DI-Transformer is much better than G-SVM and KNN.

Performance of each model to recognize the MA noise

First, we select the four traditional classification models trained with the SQI$_\mathrm{features}$ and FT-IMF$_\mathrm{all}$ features, respectively. The performance of these models are then tested on an artificial test set with progressively increasing MA-contaminated ECG segments. We generate a series of test sets with unchanged total samples (1000) to test the ability of each model to identify MA-contaminated ECG by adjusting the proportion of data obtained in test-a and test-b. We take data from test-a and test-b at the ratios of 8:2, 6:4, 4:6 and 2:8, respectively, and we denote these generated test sets as test-ab1, test-ab2, test-ab3 and test-ab4 in turn. The results of the four traditional classifiers trained with SQI$_\mathrm{features}$ on each test subset are shown in Table 10. As the proportion of MA-contaminated ECG segments increases, the Acc of all four classifiers decreases to different degrees. Relatively speaking, the result of KNN under the same proportion is better than the results obtained by the other three classifiers. Figure 10a shows the results obtained on Test-ab1 and Test-ab4. It can be seen that these classifiers are more sensitive to MA noise. The results of the five classifiers trained with FT-IMF$_\mathrm{all}$ on each test subset are shown in Table 11. As the proportion of MA-contaminated ECG segments increases, the accuracy of all five classifiers decreases to different degrees, but it is much smaller than the decrease in Table 10. As shown in Fig. 10b, the results on Test-ab1 and Test-ab4 also confirmed this view. The results in Tables 12 and 13 illustrate that the contribution of our FT-IMF$_\mathrm{all}$ features to identifying MA noise is significant at p = 0.05, and our DI-Transformer based on FT-IMF$_\mathrm{all}$ outperforms the employed four conventional classifiers across the board in recognizing MA noise.

Table 10

Acc of SQI$_\mathrm{features}$ on the four traditional classifiers

Testing set	G-SVM	LR	RF	KNN
Test-ab1	88.51	83.68	88.36	89.67
Test-ab2	84.98	80.26	85.97	86.66
Test-ab3	82.25	79.58	84.82	84.75
Test-ab4	79.56	76.49	81.54	81.71

Notice the bold indicates that among the four traditional classifiers built by SQI_features, KNN has the best performance in identifying MA

Optimal data length and computational time

To find the optimal segment length ($N_\mathrm{seg}$) for SQA, we repeat experiment F ten times on set-a with $N_\mathrm{seg}$ varying from 1 to 10 s at an increment of 1 s. Throughout the whole experiment, we only change the size of $N_\mathrm{seg}$, and the relationship between the $N_\mathrm{seg}$ and the accuracy of SQA are shown in Fig. 11a. It can be seen that as the size of $N_\mathrm{seg}$ increases, the accuracy of quality classification of our model also increases. However, when $N_\mathrm{seg}$ is greater than 6 s, the accuracy can hardly be improved. It shows that the 6 s segment has covered most of the features required for signal quality classification. In addition, the Fig. 11b reflects the relationship between sample length and training and testing times. As the $N_\mathrm{seg}$ increases, the training and testing time of the model slowly increases within 5 s. After 6 s, as it increases, the curve shows a rapid upward trend. Combining the results of Fig. 11a, b, weighing classification accuracy and computational complexity, we finally choose the optimal signal segment length as $N_\mathrm{seg} = 6s$.

Table 11

Acc of FT-IMF$_\mathrm{all}$ on all five classifiers

Testing set	G-SVM	LR	RF	KNN	DI-Transformer
Test-ab1	94.01	90.85	92.33	93.21	98.32
Test-ab2	93.82	90.12	92.98	93.04	97.11
Test-ab3	93.66	89.88	92.06	92.74	97.78
Test-ab4	93.56	89.77	91.81	92.68	97.69

Notice the bold indicates that among the five classifiers built by FT-IMF_all, KNN achieves the best performance in identifying MA

Table 12

p values of Acc between SQI$_\mathrm{features}$ and FT-IMF$_\mathrm{all}$ on the four traditional classifiers

Testing set	G-SVM	LR	RF	KNN
Test-ab1	2.21e${-}5{*}$	2.28e${-}6{*}$	1.67e${-}3{*}$	1.81e${-}3{*}$
Test-ab2	4.39e${-}4{*}$	1.28e${-}5{*}$	2.25e${-}4{*}$	2.88e${-}4{*}$
Test-ab3	1.87e${-}5{*}$	3.19e${-}6{*}$	4.92e${-}4{*}$	1.59e${-}5{*}$
Test-ab4	2.01e${-}6{*}$	1.91e${-}8{*}$	2.21e${-}6{*}$	1.46e${-}6{*}$

*Significance at 0.05 level

Table 13

p values of Acc between the four traditional classifiers and our DI-Transformer using FT-IMF$_\mathrm{all}$ features

Testing set	G-SVM	LR	RF	KNN
Test-ab1	5.16e${-}4{*}$	2.29e${-}7{*}$	3.79e${-}7{*}$	6.75e${-}4{*}$
Test-ab2	1.12e${-}3{*}$	1.00e${-}4{*}$	3.17e${-}6{*}$	2.54e${-}3{*}$
Test-ab3	8.14e${-}3{*}$	1.10e${-}5{*}$	2.70e${-}6{*}$	9.89e${-}4{*}$
Test-ab4	4.51e${-}3{*}$	6.27e${-}5{*}$	2.31e${-}6{*}$	3.07e${-}4{*}$

*Significance at 0.05 level

Performance comparison

Table 14

Performance comparison with other methods

Authors	Method	Evaluation	Se	Sp	Acc
Behar et al. [10]	Signal quality index (SQI)	5-fold cross-validation	–	–	99.30
Albaba et al. [49]	Time-frequency features and MG-SVM	5-fold cross-validation	91.00	90.00	93.00
Shahriari et al. [13]	Structural Image Similarity Metric	5-fold cross-validation	83.90	77.70	82.50
G-SVM	FT-IMF$_\mathrm{all}$ and G-SVM	5-fold cross-validation	93.97	94.61	94.27
DI-Transformer	FT-IMF$_\mathrm{all}$ and DI-Transformer	5-fold cross-validation	99.68	99.44	99.62

G-SVM: represents the classification results on G-SVM based on the proposed FT-IMF$_\mathrm{all}$ features. DI-Transformer: represents the classification results on DI-Transformer based on the proposed FT-IMF$_\mathrm{all}$ features

This paper employs the PCCC [38] database, and other papers also use that database. Table 14 lists some other well-performing methods using this database. Albaba et al. [49] constructed an SQA pipeline by combining multiple time-frequency domain features with multiple traditional classifiers, and obtained good results on the Medium Gaussian SVM (MG-SVM) classifier. The method achieves an accuracy of 93.00% on MG-SVM, which is comparable to the result obtained by our FT-IMF$_\mathrm{all}$ features on G-SVM (Acc = 94.27%), but still much lower than our DI-Transformer ( Acc = 99.62%). Shahriari et al. [13] used the SSIM to compare ECG images obtained from two ECGs at standard scales. And then, they trained a linear discriminant analysis classifier for SQA based on the SSIM between each image and all templates as feature vectors. Compared with others, their method obtained a lower accuracy. Behar et al. [10] employed indicators such as kSQI, sSQI, pSQI basSQI, bSQI, pcaSQI, and rSQI, and trained an SVM model to evaluate the quality of ECG signals to reduce false alarms, with the achieved accuracy of 99.30%. The result is higher than our G-SVM based on FT-IMF$_\mathrm{all}$ but is slightly inferior to our DI-Transformer. It is worth noting that our methods have a strong MA noise recognition ability, but [10] aimed at the normal noisy signal and do not consider the interference of MA noise. Therefore, even though their performance metrics are high, but not entirely comparable. In [13, 49], they also hardly consider the case of MA-contaminated ECG. In addition, the proposed methods have good interpretability and can achieve accurate ECG SQA, including a large amount of MA noise.

Discussion

Analyzing the performance of DLDV features

This paper uses EMD and FFT to extract the DLDV features of ECG signals. Then four different traditional classifiers (G-SVM, LR, RF, and KNN) are employed to evaluate the performance of the extracted DLDV features. Meanwhile, we also employ six traditional time-frequency related SQIs metrics as references to evaluate the performance of our DLDV features. In general, the larger span of signal quality, the more significant difference in SQI value. For example, as shown in Fig. 12, due to the obvious difference in the probability density distribution of different quality signals, the kurtosis (kSQI) and skewness (sSQI) can provide effective information for distinguishing good quality signals from bad quality signals. In addition, the other four time-frequency-related SQIs are all valid SQA indicators verified by researchers and have achieved good results in actual SQA [4, 5, 43, 44]. Therefore, this paper selects them as references to evaluate the confidence of our DLDV features for SQA.

Table 4 and Fig. 8 show the classification results and confusion matrices of the six traditional SQIs and DLDV features employed in this paper on the four classifiers. It can be seen that the DLDV features outperform these traditional SQIs metrics on the four classifiers, and even the SQI$_\mathrm{features}$ on LR with the lowest accuracy is also lower than our DLDV. The reason why our method comprehensively outperforms the traditional six SQIs is that the features extracted by our method can not only express the central tendency and discrete degree of the signal segment, but also employ the phase angle and amplitude–frequency values to express the characteristics of the transient change of the signal.

Analyzing the performance of each model to recognize the MA noise

We also design experiments to test the proposed method’s ability to recognize MA-contaminated ECGs. Our DLDV features work well for MA-contaminated ECGs, which is well confirmed in Fig. 10 and Tables 10, 11, 12, 13. Table 10 reflects the expression ability of SQI$_\mathrm{features}$ on MA noise. It can be seen that with the increase of MA noise, the accuracy of all four classifiers decreases, and the minimum decrease reaches 6.82%. It can be seen from Table 11, under the same conditions, the accuracy of all four classifiers also decreased, but the maximum decrease is only 1.08%. Tables 12 and 13 are the results of statistical analysis for Tables 10 and 11. The p values show the significant difference between SQI$_\mathrm{features}$ and FT-IMF$_\mathrm{all}$ in expressing MA noise, which is statistically significant. It can be seen from the results in Fig. 10a, the SQI$_\mathrm{features}$ has its limitation in expressing MA noise. Because these metrics are based on human-defined desirable properties of clean signals, they rely on human-specified properties, leading to inherent limitations in expressing potential features of signal quality [17]. In addition, it is difficult for us to artificially specify the features of some MA noises similar to ECG signal, so it is not surprising that the features information of them are hard to extract by using the SQI$_\mathrm{features}$. Compared with the results in Fig. 10a, the results obtained by each classifier in Fig. 10b on the two test sets are very close, with the average difference of 0.76%. It shows that the classifier constructed with our features can identify general noise well. More importantly, it also offers strong performance in identifying MA noises. Furthermore, our DI-Transformer structure achieves high accuracy on test-ab4. Such high accuracy is not only due to the design of the dual-input structure, but more importantly, the transformer’s self-attention module can also capture the timing relationship of the signal and then combine the DLDV features with improving the model’s ability to recognize MA noise. Note that we do not use the FT-IMF$_\mathrm{time}$ feature in this test experiment because this feature can only express the central tendency and dispersion of the signal and cannot fully reflect the transient change of the signal.

Analyzing the performance of proposed DI-Transformer

The effectiveness and robustness of our FT-IMF$_\mathrm{all}$ feature for SQA are verified on traditional classifiers (G-SVM, LR, RF, and KNN). Furthermore, we also propose a DI-Transformer SQA method based on the FT-IMF$_\mathrm{all}$ features. Table 9 presents a series of ablation experiments for the proposed DI-Transformer method. Figure 9 shows the confusion matrix corresponding to each ablation experiment. The results of experiments C, D, E and F show that the contribution of FT-IMF$_\mathrm{freq}$ to the SQA is much more significant than that of FT-IMF$_\mathrm{time}$. The results of experiments C and E show that the proposed dual-input structure significantly improves the model’s classification performance. Feeding the FT-IMF$_\mathrm{freq}$ (experiment A) to the transformer as input data are much better than feeding it the Raw ECG directly (experiment C), which shows that DLDV features can help the transformer model to learn the quality features more easily. It benefits from the fact that the phase angle features can well represent the transient change of the signals, and combined with the amplitude features, this transient change can be quantified. We also observe that the Se value of experiment E is higher than that of experiment C, the accuracy of experiment F is the best. It shows that experiment F tends to identify more signal segments as good quality, with the advantage of not missing valuable signals in subsequent processing stages, which is also demonstrated in the confusion matrix in Fig. 9f. From this point of view, the abstract features automatically extracted by the transformer from Raw ECG are complementary to the FT-IMF$_\mathrm{freq}$ features. Comparing the results of A, E and B, F, we find that the DI-Transformer combines the advantages of DLDV features and transformer-based abstract features, and has higher Se, Sp and Acc values. It can obtain more effective signal quality features than the single-input structure (A, B and C).

We also compare the proposed DI-Transformer with four traditional classifiers. It can be seen from Table 6 that the result on SQI$_\mathrm{features}$ is inferior to our FT-IMF$_\mathrm{all}$, but higher than our FT-IMF$_\mathrm{time}$. Because our FT-IMF$_\mathrm{time}$ does not focus on the nuances of signal and noise. The AUC values in Table 7 show that our DI-Transformer exhibits the best performance on all features, followed by KNN combined with FT-IMF$_\mathrm{all}$. Furthermore, in Table 8 the p values we provide show significant differences between the method based on SQI$_\mathrm{features}$ and the method based on FT-IMF$_\mathrm{all}$, and this significant difference is statistically significant. It is not surprising that we get such good results because our method rarely considers the morphology of Rwa ECG and instead mines the depth local features of the signal. We not only extract the transient amplitude features of the intermediate component of the signal (IMFs), but also extract the transient phase angle features that can express the subtle difference between the signal and the noise (especially for MA noise). Equally important, on the traditional classifier-based methods, although the accuracy of FT-IMF$_\mathrm{all}$ features on G-SVM is higher than that of KNN, but the receiver operating characteristic curve (ROC) of each model in Fig. 13 shows that the performance of DI-Transformer is the best (AUC = 0.993). Therefore, the DI-transformer-based model constructed by FT-IMF$_\mathrm{all}$ can provide a new set of practical solutions for SQA. In addition, it can be seen from Fig. 13b that the KNN model built with FT-IMF$_\mathrm{all}$ exhibits the best performance (AUC = 0.962), followed by RF (AUC = 0.948). Suppose the user uses the traditional method to build the signal quality classifier. In that case, the KNN or RF method based on FT-IMF$_\mathrm{all}$ can be preferred under the same conditions.

Conclusion

In summary, we present a novel ECG SQA method that fuses the proposed DLDV features and the DI-Transformer framework for improving the recognition ability of MA-contaminated ECG. For the first time, we combine DLDV features and transformer to handle the ECG SQA problem. Specifically, we use EMD and FFT to extract DLDV features of Raw ECG in the time-frequency domain. The extracted DLDV feature can identify subtle differences between MA and ECG signals through depth local amplitude and phase angle features. When it is fused with the temporal relationship features extracted by DI-Transformer, its accuracy is significantly improved compared to the method based on traditional SQIs. Experiments on SQA tasks show that the proposed method outperforms the state-of-the-art SQA methods. In addition, our method can not only identify the common type of noise from noise-contaminated ECGs, more importantly, it can effectively identify MA-contaminated ECG. In the future, we will improve the proposed method and make it suitable for SQA of other physiological signals, such as SQA of electroencephalogram and electromyogram.

Acknowledgements

This project is supported by the Science and Technology Major Project of Hubei Province (Next-Generation AI Technologies) under Grant 2019AEA170.

Declarations

Conflict of interest

The authors declare no conflicts of interest.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel Generative knowledge-based transfer learning for few-shot health condition estimation

Nächster Artikel Deep learning based brain tumor segmentation: a survey

Clifford GD, Azuaje F (2006) Advanced methods and tools for ECG data analysis, vol 10. In: McSharry P (ed). Artech house, Boston

Satija U, Ramkumar B, Manikandan MS (2016) A unified sparse signal decomposition and reconstruction framework for elimination of muscle artifacts from ECG signal. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 779–783

Nguyen P, Kim J-M (2016) Adaptive ECG denoising using genetic algorithm-based thresholding and ensemble empirical mode decomposition. Inf Sci 373:499–511CrossRef

Hu M, Zhang S, Dong W, Xu F, Liu H (2021) Adaptive denoising algorithm using peak statistics-based thresholding and novel adaptive complementary ensemble empirical mode decomposition. Inf Sci 563:269–289MathSciNetCrossRef

Alyasseri ZAA, Khader AT, Al-Betar MA, Awadallah MA (2018) Hybridizing $\beta $-hill climbing with wavelet transform for denoising ECG signals. Inf Sci 429:229–246MathSciNetCrossRef

Xie X, Liu H, Shu M, Zhu Q, Huang A, Kong X, Wang Y (2021) A multi-stage denoising framework for ambulatory ECG signal based on domain knowledge and motion artifact detection. Future Gener Comput Syst 116:103–116CrossRef

Orphanidou C, Drobnjak I (2016) Quality assessment of ambulatory ECG using wavelet entropy of the HRV signal. IEEE J Biomed Health Inform 21(5):1216–1223CrossRef

Mayer C, Bachler M, Holzinger A, Stein PK, Wassertheurer S (2016) The effect of threshold values and weighting factors on the association between entropy measures and mortality after myocardial infarction in the Cardiac Arrhythmia Suppression Trial (CAST). Entropy 18(4):129CrossRef

Xia Y, Jia H (2017) ECG quality assessment based on multi-feature fusion. In: 2017 13th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, pp 672–676

10.

Behar J, Oster J, Li Q, Clifford GD (2013) ECG signal quality during arrhythmia and its application to false alarm reduction. IEEE Trans Biomed Eng 60(6):1660–1666CrossRef

11.

Satija U, Ramkumar B, Manikandan MS (2017) Real-time signal quality-aware ECG telemetry system for IoT-based health care monitoring. IEEE Internet Things J 4(3):815–823CrossRef

12.

Zhang Y, Wei S, Zhang L, Liu C (2019) Comparing the performance of random forest, SVM and their variants for ECG quality assessment combined with nonlinear features. J Med Biol Eng 39(3):381–392CrossRef

13.

Shahriari Y, Fidler R, Pelter MM, Bai Y, Villaroman A, Hu X (2017) Electrocardiogram signal quality assessment based on structural image similarity metric. IEEE Trans Biomed Eng 65(4):745–753CrossRef

14.

Holzinger A, Hörtenhuber M, Mayer C, Bachler M, Wassertheurer S, Pinho AJ, Koslicki D (2014) On entropy-based data mining. Interactive knowledge discovery and data mining in biomedical informatics. Springer, Berlin, pp 209–226CrossRef

15.

Liu G, Han X, Tian L, Zhou W, Liu H (2021) ECG quality assessment based on hand-crafted statistics and deep-learned s-transform spectrogram features. Comput Methods Progr Biomed 208:106269CrossRef

16.

Herraiz ÁH, Martínez-Rodrigo A, Bertomeu-González V, Quesada A, Rieta JJ, Alcaraz R (2020) A deep learning approach for featureless robust quality assessment of intermittent atrial fibrillation recordings from portable and wearable devices. Entropy 22(7):733CrossRef

17.

Seeuws N, De Vos M, Bertrand A (2021) Electrocardiogram quality assessment using unsupervised deep learning. IEEE Trans Biomed Eng 69(2):882–893

18.

Zhang J, Wang L, Zhang W, Yao J (2018) A signal quality assessment method for electrocardiography acquired by mobile device. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1–3

19.

Magrin-Chagnolleau I, Baraniuk RG (1999) Empirical mode decomposition based time-frequency attributes. In: SEG technical program expanded abstracts 1999. Society of Exploration Geophysicists, pp 1949–1952

20.

Oberst U (2007) The fast Fourier transform. SIAM J Control Optim 46(2):496–540MathSciNetCrossRefMATH

21.

Schölkopf B, Smola A, Müller K-R (1997) Kernel principal component analysis. In: International conference on artificial neural networks. Springer, pp 583–588

22.

Lee J, McManus DD, Merchant S, Chon KH (2011) Automatic motion and noise artifact detection in Holter ECG data using empirical mode decomposition and statistical approaches. IEEE Trans Biomed Eng 59(6):1499–1506

23.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

24.

Heil CE, Walnut DF (1989) Continuous and discrete wavelet transforms. SIAM Rev 31(4):628–666MathSciNetCrossRefMATH

25.

Rilling G, Flandrin P (2007) One or two frequencies? The empirical mode decomposition answers. IEEE Trans Signal Process 56(1):85–95MathSciNetCrossRefMATH

26.

Wu Z, Huang NE, Long SR, Peng C-K (2007) On the trend, detrending, and variability of nonlinear and nonstationary time series. Proc Natl Acad Sci 104(38):14889–14894CrossRef

27.

Hasan S, Muttaqi KM, Sutanto D (2019) Automated segmentation of the voltage SAG signal using Hilbert Huang transform to calculate and characterize the phase angle jump. In: 2019 IEEE industry applications society annual meeting. IEEE, pp 1–6

28.

Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52CrossRef

29.

Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319CrossRef

30.

Lee J-M, Yoo C, Choi SW, Vanrolleghem PA, Lee I-B (2004) Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 59(1):223–234CrossRef

31.

Lee J-M, Yoo C, Lee I-B (2004) Fault detection of batch processes using multiway kernel principal component analysis. Comput Chem Eng 28(9):1837–1847CrossRef

32.

Cai P, Deng X (2020) Incipient fault detection for nonlinear processes based on dynamic multi-block probability related kernel principal component analysis. ISA Trans 105:210–220CrossRef

33.

Yan G, Liang S, Zhang Y, Liu F (2019) Fusing transformer model with temporal features for ECG heartbeat classification. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 898–905

34.

Guan J, Wang W, Feng P, Wang X, Wang W (2021) Low-dimensional denoising embedding transformer for ECG classification. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1285–1289

35.

Song H, Rajan D, Thiagarajan JJ, Spanias A (2018) Attend and diagnose: clinical time series analysis using attention models. In: Thirty-second AAAI conference on artificial intelligence

36.

Yuan S, He Z, Zhao J, Yuan Z (2021) Low-dimensional depth local dual-view features embedded transformer for electrocardiogram signal quality assessment. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1137–1144

37.

Zhao Z, Wu Y (2016) Attention-based convolutional neural networks for sentence classification. Interspeech 8:705–709

38.

Silva I, Moody GB, Celi L (2011) Improving the quality of ECGs collected using mobile phones: the physionet/computing in cardiology challenge 2011. In: 2011 computing in cardiology. IEEE, pp 273–276

39.

Li Q, Clifford G (2011) Signal quality indices and data fusion for determining acceptability of electrocardiograms collected in noisy ambulatory environments. Comput Cardiol 38:1

40.

Moody GB, Muldrow W, Mark RG (1984) A noise stress test for arrhythmia detectors. Comput Cardiol 11(3):381–384

41.

Fletcher GS (2019) Clinical epidemiology: the essentials. Lippincott Williams & Wilkins

42.

Carrington AM, Fieguth PW, Qazi H, Holzinger A, Chen HH, Mayr F, Manuel DG (2020) A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med Inform Decis Mak 20(1):1–12CrossRef

43.

Varewyck M, Martens J-P (2010) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B (Cybern) 41(2):330–340CrossRef

44.

Sahadat MN, Jacobs EL, Morshed BI (2014) Hardware-efficient robust biometric identification from amplitude and interval features of 0.58 second limb (lead I) ECG signal using logistic regression classifier. In: Engineering in Medicine and Biology Society (EMBC), Chicago, IL, pp 1440–1443

45.

Li T, Zhou M (2016) ECG classification using wavelet packet entropy and random forests. Entropy 18(8):285

46.

Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput 100(7):750–753CrossRefMATH

47.

Li Q, Mark RG, Clifford GD (2007) Robust heart rate estimation from multiple asynchronous noisy sources using signal quality indices and a Kalman filter. Physiol Meas 29(1):15CrossRef

48.

Liu F, Wei S, Lin F, Jiang X, Liu C (2020) An overview of signal quality indices on dynamic ECG signal quality assessment. Feature Eng Comput Intell ECG Monit 33–54

49.

Albaba A, Simões-Capela N, Wang Y, Hendriks RC, De Raedt W, Van Hoof C (2021) Assessing the signal quality of electrocardiograms from varied acquisition sources: a generic machine learning pipeline for model generation. Comput Biol Med 130:104164CrossRef

Titel: Fusing depth local dual-view features and dual-input transformer framework for improving the recognition ability of motion artifact-contaminated electrocardiogram
verfasst von: Shuaiying Yuan
Ziyang He
Jianhui Zhao
Zhiyong Yuan
Publikationsdatum: 21.09.2022
Verlag: Springer International Publishing
Erschienen in: Complex & Intelligent Systems / Ausgabe 1/2023
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI: https://doi.org/10.1007/s40747-022-00861-z

Features	G-SVM	LR	RF	KNN
SQI\(_\mathrm{features}\)	1.51e\({-}3{*}\)	1.43e\({-}7{*}\)	1.60e\({-}6{*}\)	4.93e\({-}3{*}\)
FT-IMF\(_\mathrm{time}\)	0.8971	4.04e\({-}6{*}\)	2.35e\({-}5{*}\)	2.76e\({-}4{*}\)
FT-IMF\(_\mathrm{freq}\)	5.03e\({-}4{*}\)	7.93e\({-}3{*}\)	5.68e\({-}6{*}\)	4.84e\({-}3{*}\)
FT-IMF\(_\mathrm{all}\)	6.11e\({-}3{*}\)	8.15e\({-}3{*}\)	1.85e\({-}4{*}\)	1.11e\({-}3{*}\)

Testing set	G-SVM	LR	RF	KNN
Test-ab1	2.21e\({-}5{*}\)	2.28e\({-}6{*}\)	1.67e\({-}3{*}\)	1.81e\({-}3{*}\)
Test-ab2	4.39e\({-}4{*}\)	1.28e\({-}5{*}\)	2.25e\({-}4{*}\)	2.88e\({-}4{*}\)
Test-ab3	1.87e\({-}5{*}\)	3.19e\({-}6{*}\)	4.92e\({-}4{*}\)	1.59e\({-}5{*}\)
Test-ab4	2.01e\({-}6{*}\)	1.91e\({-}8{*}\)	2.21e\({-}6{*}\)	1.46e\({-}6{*}\)

Testing set	G-SVM	LR	RF	KNN
Test-ab1	5.16e\({-}4{*}\)	2.29e\({-}7{*}\)	3.79e\({-}7{*}\)	6.75e\({-}4{*}\)
Test-ab2	1.12e\({-}3{*}\)	1.00e\({-}4{*}\)	3.17e\({-}6{*}\)	2.54e\({-}3{*}\)
Test-ab3	8.14e\({-}3{*}\)	1.10e\({-}5{*}\)	2.70e\({-}6{*}\)	9.89e\({-}4{*}\)
Test-ab4	4.51e\({-}3{*}\)	6.27e\({-}5{*}\)	2.31e\({-}6{*}\)	3.07e\({-}4{*}\)

Springer Professional

Fusing depth local dual-view features and dual-input transformer framework for improving the recognition ability of motion artifact-contaminated electrocardiogram

Abstract

Publisher's Note

Introduction

Methodology

The DLDV features extraction and KPCA

DLDV features extraction

DLDV feature dimension reduction

Proposed dual-input transformer model

Transformer encoder layer

Dual-input features fusion and classification

Experiments and results

ECG database and experimental setting

ECG database

Experimental setting

Experiments results

Performance evaluation of DLDV features

Comparison of our DI-Transformer and four traditional classifiers

Ablation study on DI-Transformer model

Performance of each model to recognize the MA noise

Optimal data length and computational time

Performance comparison

Discussion

Analyzing the performance of DLDV features

Analyzing the performance of each model to recognize the MA noise

Analyzing the performance of proposed DI-Transformer

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Premium Partner

Parameters	Value	Notes
Batch_size	32	The number of samples fed into the model each time
\({d}_{\text {model}}\)	512	Input size of transformer encoder layer
\({d}_{\text {in}}\)	2648	The input dimension of the linear layer
Num_module	6	The number of attention module
Num_heads	6	Number of heads in each multi-head module
\(N_\mathrm{seg}\)	6 s	The length of each ECG segment
Dropout_ratio	0.1	The proportion of neurons randomly discarded during the training phase
Optimizer	–	The Adam optimizer
Learning rate	0.01	The initial value was 0.01 and randomly changed every five epochs from 0.01 to 0.001
k	2124	Dimensions that satisfy the condition (\(P\ge 95\%\)) after KPCA

SQIs	G-SVM			LR			RF			KNN
SQIs	Se	\(P_{+}\)	Acc	Se	\(P_{+}\)	Acc	Se	\(P_{+}\)	Acc	Se	\(P_{+}\)	Acc
sSQI [47]	86.19	87.31	84.19	78.36	84.43	78.36	88.42	92.14	86.27	89.91	93.27	87.92
kSQI [47]	85.73	86.77	83.73	70.68	79.36	76.69	86.73	90.81	86.11	87.28	91.54	86.28
pSQI [10]	85.99	87.29	84.00	84.89	85.81	82.50	85.63	89.21	85.22	86.87	90.32	82.23
LpSQI [48]	84.44	86.74	83.79	78.74	83.13	76.22	83.76	87.00	83.34	80.55	89.45	83.26
MpSQI [48]	84.76	86.65	84.21	81.49	85.12	79.14	85.49	88.83	84.86	85.47	90.73	84.02
HpSQI [48]	83.43	88.75	83.44	78.74	86.83	78.61	84.45	87.89	83.16	82.45	89.74	84.77
FT-IMF\(_\mathrm{time}\)	82.57	89.41	85.99	80.68	85.52	81.72	85.33	86.95	84.25	84.29	90.97	83.05
FT-IMF\(_\mathrm{freq}\)	93.42	97.85	93.32	88.52	95.35	87.76	88.25	97.69	92.73	93.20	97.63	92.98

Springer Professional

Abstract

Publisher's Note

Introduction

Methodology

The DLDV features extraction and KPCA

DLDV features extraction

DLDV feature dimension reduction

Proposed dual-input transformer model

Transformer encoder layer

Dual-input features fusion and classification

Experiments and results

ECG database and experimental setting

ECG database

Experimental setting

Experiments results

Performance evaluation of DLDV features

Comparison of our DI-Transformer and four traditional classifiers

Ablation study on DI-Transformer model

Performance of each model to recognize the MA noise

Optimal data length and computational time

Performance comparison

Discussion

Analyzing the performance of DLDV features

Analyzing the performance of each model to recognize the MA noise

Analyzing the performance of proposed DI-Transformer

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Weitere Artikel der Ausgabe 1/2023

An environment-driven hybrid evolutionary algorithm for dynamic multi-objective optimization problems

Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition

Interval-valued Pythagorean fuzzy multi-criteria decision-making method based on the set pair analysis theory and Choquet integral

A focused crawler based on semantic disambiguation vector space model

A novel density deviation multi-peaks automatic clustering algorithm

Multimodal medical image fusion with convolution sparse representation and mutual information correlation in NSST domain

Premium Partner