Elsevier

Expert Systems with Applications

Volume 53, 1 July 2016, Pages 129-137
Expert Systems with Applications

Dictionary learning for VQ feature extraction in ECG beats classification

https://doi.org/10.1016/j.eswa.2016.01.031Get rights and content

Highlights

  • We improve dictionary learning algorithm for vector quantization of ECG.

  • The algorithm is employed to extract feature of ECG.

  • The algorithm can avoid interference from dirty data.

  • The algorithm is capable of increasing classification accuracy.

  • An initial cluster centers selecting method is utilized to speed up the algorithm.

Abstract

Vector quantization(VQ) can perform efficient feature extraction from electrocardiogram (ECG) with the advantages of dimensionality reduction and accuracy increase. However, the existing dictionary learning algorithms for vector quantization are sensitive to dirty data, which compromises the classification accuracy. To tackle the problem, we propose a novel dictionary learning algorithm that employs k-medoids cluster optimized by k-means++ and builds dictionaries by searching and using representative samples, which can avoid the interference of dirty data, and thus boost the classification performance of ECG systems based on vector quantization features. We apply our algorithm to vector quantization feature extraction for ECG beats classification, and compare it with popular features such as sampling point feature, fast Fourier transform feature, discrete wavelet transform feature, and with our previous beats vector quantization feature. The results show that the proposed method yields the highest accuracy and is capable of reducing the computational complexity of ECG beats classification system. The proposed dictionary learning algorithm provides more efficient encoding for ECG beats, and can improve ECG classification systems based on encoded feature.

Introduction

Electrocardiogram (ECG) is the record of cardiac electrical activity signal. Categories of heart beats (normal or different types of disorder) are important signs for diagnosing cardiovascular diseases. As manual analysis of beats is very laborious, automatic ECG classification has been studied and applied in practice.

Artificial intelligence and machine learning have been widely used in this domain, where features are extracted from ECG and used in both classifier training and classification. While many classifiers such as PSO-RBF classifier (Korürek & Doğan, 2010), PSO-SVM classifier (Melgani & Bazi, 2008), SVM classifier (Ye, Coimbra, & Kumar, 2010), LS-SVM classifier (Dutta, Chatterjee, & Munshi, 2010), have been adopted in the studies, the choice and extraction of proper features become very important for achieving satisfied classification performance.

At present, there are three most popular types of features representing different aspects of ECG: temporal features, statistical features and morphological features. Among them, morphological information is the most difficult to be quantified, whether in manual analysis or automatic classification. For example, the duration of the QRS wave can be easily computed by timeendtimestart, but representing the morphological information of the wave is much harder. In the context of classification, features are regarded as observation, and they should express useful information as much as possible for an effective classification.

On the other hand, dimensionality serves as another important performance of a feature. Generally speaking, the feature dimensionality dominates the amount of computation in classifier learning because most classification algorithms have a time complexity O(dLn), where n ≥ 1 and dL is the feature dimensionality. Reducing the feature dimensionality can effectively reduce the amount of computation.

Therefore, we focus on the study of morphological feature that can both increase classification accuracy and reduce the computational complexity in classifier learning. Some recently reports about time series feature extraction have attracted our attention (Baydogan, Runger, & Tuv, 2013; Kim, Yazicioglu, Merken, Van Hoof, Yoo, 2010, Wang, Liu, FH She, Nahavandi, Kouzani, 2013a, Wang, Liu, She, Nahavandi, Kouzani, 2013b, Wang, Sun, She, Kouzani, Nahavandi, 2013c). They first divided beats into some local parts, and then encoded these local parts as feature, which achieved state-of-the-art classification performances. The encoding process has two main steps, i.e., firstly learning a dictionary from a local part of the dataset, and then encoding the local beats as feature by similarity measure between the dictionary and the local part.

The dictionary plays an important role in feature extraction because it directly decides which code is matched. In such encoding methods (Baydogan, Runger, Tuv, 2013, Kim, Yazicioglu, Merken, Van Hoof, Yoo, 2010, Liu, Si, Wen, Zang, Song, Lang, 2014, Wang, Liu, FH She, Nahavandi, Kouzani, 2013a, Wang, Liu, She, Nahavandi, Kouzani, 2013b, Wang, Sun, She, Kouzani, Nahavandi, 2013c), k-means clustering and its extensional algorithms, sparse coding, have been widely used to learn the dictionary. However, dictionaries generated by above learning methods are very sensitive to dirty data, because their cluster centers are combination of all training samples containing both favorable and dirty data.

For solving this problem, we propose a novel dictionary learning algorithm that employs k-medoids cluster optimized by k-means++ and builds dictionaries by searching and using representative samples. Furthermore, we implement a classification system that performs the proposed dictionary learning algorithm in feature extraction.

The rest of the paper is organized as follows. In Section 2, we introduce the related works. In Section 3, we describe the existing vector quantization ECG and the difference between our method and the existing ones. In Section 4, we propose our dictionary learning algorithm for vector quantization feature extraction. In Section 5, we present the classification framework based on the proposed method. We present experiments and results in Section 6. Finally we conclude the paper in Section 7.

Section snippets

Related work

At present, there are many studies concerning about automatic ECG classification. The classification system needs certain features extracted from ECG signals first and then classifies them by a classifier. Lots of classifiers have been widely used for this purpose, such as neural network (Korürek, Doğan, 2010, Özbay, Tezel, 2010, Zidelmal, Amirou, Ould-Abdeslam, Merckle, 2013) and SVM (Dutta, Chatterjee, Munshi, 2010, Melgani, Bazi, 2008, Moavenian, Khorrami, 2010, Ye, Coimbra, Kumar, 2010).

Vector quantization

Vector quantization learns a set of dictionaries to encode ECG local segments, which can be directly utilized in feature extraction for ECG classification. Existing vector quantization methods widely use k-means dictionary learning algorithm.

In ECG classification system, let X be a set of ECG local segments, one can simply split an object (a local segment of ECG signal) and represent it as X=[x1,x2,,xn]. The Bag of Words (BoW) method (Baydogan, Runger, Tuv, 2013, Wang, Liu, She, Nahavandi,

Our dictionary learning algorithm

Our dictionary learning algorithm is designed to support better codeword assignment by overcoming some disadvantages related to the above mentioned k-means vector quantization method, especially its local representation and dictionary learning. Under k-means, local segments of a beat are approximated by cluster-centers. A cluster-center is the mean of the samples assigned to this cluster, and thus every training sample can affect the cluster-centers. While k-means can generate smooth

Classification system

Beats classification system contains three major stages that are preprocessing, feature extraction, and classification. We employ our alignment strategy (Liu et al., 2014) in preprocessing and perform the proposed dictionary learning method in feature extraction, and finally use a common classifier to classify beats. The process of classification system is shown in Fig. 5.

Datasets

So far, many ECG datasets have been applied in the study, such as MIT-BIH Arrhyt, MIT-BIH ST DB, MIT-BIH Long Term, MIT-BIH Sup. Vent., ESC STT, MIT-BIH NSR DB, Sudden Death etc. The ECG on different datasets usually contains different typical disease. For example, MIT-BIH-AHA dataset tends to the changes of QRS complex group, as well as MIT-BIH-ST dataset and SCT-STT dataset incline to the changes of ST segment. At the same time, different datasets may come with differences arising from the

Conclusion

Feature is one of the most important factors in ECG beats classification system. VQ methods have demonstrated its advantage in both dimensionality reduction and accuracy increase. These methods represent locals of beats, build the dictionary, and utilize it to quantize locals.

VQ feature can improve the performance of ECG classification system in both accuracy and computational cost reduction. However, we find that the k-means cluster, which is widely adopted in dictionary learning step of the

Acknowledgments

This work was supported by the Key Scientific and Technological Research Project of Jilin Province under Grant No.20150204039GX, and the Key Scientific and Technological Project of Changchun under Grant No.14KG064.

The authors would like to thank Laguna, Pablo, et al. for making their experimental datasets (Laguna et al., 1997) available, and Goldberger A L, Amaral L A N, Glass L, et al. for making their ECGPUWAVE code (Goldberger et al., 2000) available.

References (29)

  • YuS.-N. et al.

    Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network

    Pattern Recognition Letters

    (2007)
  • ZangM. et al.

    A novel topic feature for image scene classification

    Neurocomputing

    (2015)
  • ZidelmalZ. et al.

    ECG beat classification using a cost sensitive classifier

    Computer methods and programs in biomedicine

    (2013)
  • ArthurD. et al.

    k-means++: The advantages of careful seeding

    Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms

    (2007)
  • Cited by (43)

    • Discriminative dictionary learning based on statistical methods

      2022, Statistical Modeling in Machine Learning: Concepts and Applications
    • GB-SVNN: Genetic BAT assisted support vector neural network for arrhythmia classification using ECG signals

      2021, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      The challenges of the arrhythmia classification strategies empower the need for the effective classification standard. Liu et al. (2016) proposed a dictionary learning algorithm for VQ feature extraction in ECG beats classification that utilizes the k-medoids cluster determined optimally using the k-means++ for developing the dictionaries. The main advantage of the method is that the interference of the dirty data was rejected and the dimension was reduced.

    • The effect of dictionary learning on weight update of AdaBoost and ECG classification

      2020, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      It was proven in the study that the proposed k-LiMapS method has high compression rate within a controlled approximation measure. Liu et al. (2016) proposed a dictionary learning algorithm for vector quantization to extract features from ECG signals. The proposed method was also used to eliminate noise from the signals.

    • A secure fuzzy extractor based biometric key authentication scheme for body sensor network in Internet of Medical Things

      2020, Computer Communications
      Citation Excerpt :

      A few of the previous works in feature extraction are analysed here. Liu et al. [11] proposed an efficient feature extraction technique for ECG signals using vector quantization principle. Vector quantization was implemented with the dictionary learning algorithm for efficient classification of abnormalities from ECG signal.

    • A novel method for identifying electrocardiograms using an independent component analysis and principal component analysis network

      2020, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      In these systems, a feature extraction stage designed to decipher the hidden information in original signals is critical and affects the performance of the whole system to a high degree. The common features of ECG signals can be typically divided into three categories which are time-domain features, frequency-domain features and statistical features: (i) time-domain features [8,12,16] are the intuitive pieces of information that mainly reflect the physical information of the ECG signals, including the amplitude and intervals between the P, Q, R, S and T peaks; (ii) frequency-domain features [9,10,12,14,16] are usually certain relative parameters obtained by specific transformations such as the wavelet transform, Fourier transform, discrete cosine transform and Hilbert transform; (iii) statistical features [10–13] contain a variety of statistics calculated from ECG signals, for which principal component analysis (PCA) and independent components analysis (ICA) are the most prevalent linear statistical methods. In terms of the classifiers, numerous machine learning algorithms such as support vector machines (SVMs) [10–12], various neural networks [13–15], clustering algorithms and ensemble learning [16] have been widely employed with notable advantages in the field of ECG classification.

    View all citing articles on Scopus
    View full text