Dictionary learning for VQ feature extraction in ECG beats classification
Introduction
Electrocardiogram (ECG) is the record of cardiac electrical activity signal. Categories of heart beats (normal or different types of disorder) are important signs for diagnosing cardiovascular diseases. As manual analysis of beats is very laborious, automatic ECG classification has been studied and applied in practice.
Artificial intelligence and machine learning have been widely used in this domain, where features are extracted from ECG and used in both classifier training and classification. While many classifiers such as PSO-RBF classifier (Korürek & Doğan, 2010), PSO-SVM classifier (Melgani & Bazi, 2008), SVM classifier (Ye, Coimbra, & Kumar, 2010), LS-SVM classifier (Dutta, Chatterjee, & Munshi, 2010), have been adopted in the studies, the choice and extraction of proper features become very important for achieving satisfied classification performance.
At present, there are three most popular types of features representing different aspects of ECG: temporal features, statistical features and morphological features. Among them, morphological information is the most difficult to be quantified, whether in manual analysis or automatic classification. For example, the duration of the QRS wave can be easily computed by but representing the morphological information of the wave is much harder. In the context of classification, features are regarded as observation, and they should express useful information as much as possible for an effective classification.
On the other hand, dimensionality serves as another important performance of a feature. Generally speaking, the feature dimensionality dominates the amount of computation in classifier learning because most classification algorithms have a time complexity where n ≥ 1 and dL is the feature dimensionality. Reducing the feature dimensionality can effectively reduce the amount of computation.
Therefore, we focus on the study of morphological feature that can both increase classification accuracy and reduce the computational complexity in classifier learning. Some recently reports about time series feature extraction have attracted our attention (Baydogan, Runger, & Tuv, 2013; Kim, Yazicioglu, Merken, Van Hoof, Yoo, 2010, Wang, Liu, FH She, Nahavandi, Kouzani, 2013a, Wang, Liu, She, Nahavandi, Kouzani, 2013b, Wang, Sun, She, Kouzani, Nahavandi, 2013c). They first divided beats into some local parts, and then encoded these local parts as feature, which achieved state-of-the-art classification performances. The encoding process has two main steps, i.e., firstly learning a dictionary from a local part of the dataset, and then encoding the local beats as feature by similarity measure between the dictionary and the local part.
The dictionary plays an important role in feature extraction because it directly decides which code is matched. In such encoding methods (Baydogan, Runger, Tuv, 2013, Kim, Yazicioglu, Merken, Van Hoof, Yoo, 2010, Liu, Si, Wen, Zang, Song, Lang, 2014, Wang, Liu, FH She, Nahavandi, Kouzani, 2013a, Wang, Liu, She, Nahavandi, Kouzani, 2013b, Wang, Sun, She, Kouzani, Nahavandi, 2013c), k-means clustering and its extensional algorithms, sparse coding, have been widely used to learn the dictionary. However, dictionaries generated by above learning methods are very sensitive to dirty data, because their cluster centers are combination of all training samples containing both favorable and dirty data.
For solving this problem, we propose a novel dictionary learning algorithm that employs k-medoids cluster optimized by k-means++ and builds dictionaries by searching and using representative samples. Furthermore, we implement a classification system that performs the proposed dictionary learning algorithm in feature extraction.
The rest of the paper is organized as follows. In Section 2, we introduce the related works. In Section 3, we describe the existing vector quantization ECG and the difference between our method and the existing ones. In Section 4, we propose our dictionary learning algorithm for vector quantization feature extraction. In Section 5, we present the classification framework based on the proposed method. We present experiments and results in Section 6. Finally we conclude the paper in Section 7.
Section snippets
Related work
At present, there are many studies concerning about automatic ECG classification. The classification system needs certain features extracted from ECG signals first and then classifies them by a classifier. Lots of classifiers have been widely used for this purpose, such as neural network (Korürek, Doğan, 2010, Özbay, Tezel, 2010, Zidelmal, Amirou, Ould-Abdeslam, Merckle, 2013) and SVM (Dutta, Chatterjee, Munshi, 2010, Melgani, Bazi, 2008, Moavenian, Khorrami, 2010, Ye, Coimbra, Kumar, 2010).
Vector quantization
Vector quantization learns a set of dictionaries to encode ECG local segments, which can be directly utilized in feature extraction for ECG classification. Existing vector quantization methods widely use k-means dictionary learning algorithm.
In ECG classification system, let X be a set of ECG local segments, one can simply split an object (a local segment of ECG signal) and represent it as . The Bag of Words (BoW) method (Baydogan, Runger, Tuv, 2013, Wang, Liu, She, Nahavandi,
Our dictionary learning algorithm
Our dictionary learning algorithm is designed to support better codeword assignment by overcoming some disadvantages related to the above mentioned k-means vector quantization method, especially its local representation and dictionary learning. Under k-means, local segments of a beat are approximated by cluster-centers. A cluster-center is the mean of the samples assigned to this cluster, and thus every training sample can affect the cluster-centers. While k-means can generate smooth
Classification system
Beats classification system contains three major stages that are preprocessing, feature extraction, and classification. We employ our alignment strategy (Liu et al., 2014) in preprocessing and perform the proposed dictionary learning method in feature extraction, and finally use a common classifier to classify beats. The process of classification system is shown in Fig. 5.
Datasets
So far, many ECG datasets have been applied in the study, such as MIT-BIH Arrhyt, MIT-BIH ST DB, MIT-BIH Long Term, MIT-BIH Sup. Vent., ESC STT, MIT-BIH NSR DB, Sudden Death etc. The ECG on different datasets usually contains different typical disease. For example, MIT-BIH-AHA dataset tends to the changes of QRS complex group, as well as MIT-BIH-ST dataset and SCT-STT dataset incline to the changes of ST segment. At the same time, different datasets may come with differences arising from the
Conclusion
Feature is one of the most important factors in ECG beats classification system. VQ methods have demonstrated its advantage in both dimensionality reduction and accuracy increase. These methods represent locals of beats, build the dictionary, and utilize it to quantize locals.
VQ feature can improve the performance of ECG classification system in both accuracy and computational cost reduction. However, we find that the k-means cluster, which is widely adopted in dictionary learning step of the
Acknowledgments
This work was supported by the Key Scientific and Technological Research Project of Jilin Province under Grant No.20150204039GX, and the Key Scientific and Technological Project of Changchun under Grant No.14KG064.
The authors would like to thank Laguna, Pablo, et al. for making their experimental datasets (Laguna et al., 1997) available, and Goldberger A L, Amaral L A N, Glass L, et al. for making their ECGPUWAVE code (Goldberger et al., 2000) available.
References (29)
- et al.
An objective approach to cluster validation
Pattern Recognition Letters
(2006) - et al.
Correlation technique and least square support vector machine combine for frequency domain based ECG beat classification
Medical engineering & physics
(2010) - et al.
A comparative study of DWT, CWT and DCT transformations in ECG arrhythmias classification
Expert systems with Applications
(2010) - et al.
ECG beat classification using particle swarm optimization and radial basis function neural network
Expert systems with Applications
(2010) - et al.
A qualitative comparison of artificial neural networks and support vector machines in ECG arrhythmias classification
Expert Systems with Applications
(2010) - et al.
A new method for classification of ECG arrhythmias using neural network with adaptive activation function
Digital Signal Processing
(2010) - et al.
A simple and fast algorithm for k-medoids clustering
Expert Systems with Applications
(2009) - et al.
Biomedical time series clustering based on non-negative sparse coding and probabilistic topic model
Computer Methods and Programs in Biomedicine
(2013) - et al.
Bag-of-words representation for biomedical time series classification
Biomedical Signal Processing and Control
(2013) - et al.
Unsupervised mining of long time series based on latent topic model
Neurocomputing
(2013)
Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network
Pattern Recognition Letters
A novel topic feature for image scene classification
Neurocomputing
ECG beat classification using a cost sensitive classifier
Computer methods and programs in biomedicine
k-means++: The advantages of careful seeding
Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms
Cited by (43)
A lightweight convolutional neural network hardware implementation for wearable heart rate anomaly detection
2023, Computers in Biology and MedicineDiscriminative dictionary learning based on statistical methods
2022, Statistical Modeling in Machine Learning: Concepts and ApplicationsGB-SVNN: Genetic BAT assisted support vector neural network for arrhythmia classification using ECG signals
2021, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :The challenges of the arrhythmia classification strategies empower the need for the effective classification standard. Liu et al. (2016) proposed a dictionary learning algorithm for VQ feature extraction in ECG beats classification that utilizes the k-medoids cluster determined optimally using the k-means++ for developing the dictionaries. The main advantage of the method is that the interference of the dirty data was rejected and the dimension was reduced.
The effect of dictionary learning on weight update of AdaBoost and ECG classification
2020, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :It was proven in the study that the proposed k-LiMapS method has high compression rate within a controlled approximation measure. Liu et al. (2016) proposed a dictionary learning algorithm for vector quantization to extract features from ECG signals. The proposed method was also used to eliminate noise from the signals.
A secure fuzzy extractor based biometric key authentication scheme for body sensor network in Internet of Medical Things
2020, Computer CommunicationsCitation Excerpt :A few of the previous works in feature extraction are analysed here. Liu et al. [11] proposed an efficient feature extraction technique for ECG signals using vector quantization principle. Vector quantization was implemented with the dictionary learning algorithm for efficient classification of abnormalities from ECG signal.
A novel method for identifying electrocardiograms using an independent component analysis and principal component analysis network
2020, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :In these systems, a feature extraction stage designed to decipher the hidden information in original signals is critical and affects the performance of the whole system to a high degree. The common features of ECG signals can be typically divided into three categories which are time-domain features, frequency-domain features and statistical features: (i) time-domain features [8,12,16] are the intuitive pieces of information that mainly reflect the physical information of the ECG signals, including the amplitude and intervals between the P, Q, R, S and T peaks; (ii) frequency-domain features [9,10,12,14,16] are usually certain relative parameters obtained by specific transformations such as the wavelet transform, Fourier transform, discrete cosine transform and Hilbert transform; (iii) statistical features [10–13] contain a variety of statistics calculated from ECG signals, for which principal component analysis (PCA) and independent components analysis (ICA) are the most prevalent linear statistical methods. In terms of the classifiers, numerous machine learning algorithms such as support vector machines (SVMs) [10–12], various neural networks [13–15], clustering algorithms and ensemble learning [16] have been widely employed with notable advantages in the field of ECG classification.