Classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier

https://doi.org/10.1016/j.bspc.2017.12.004Get rights and content

Highlights

Abstract

Computer-aided heartbeat classification has a significant role in the diagnosis of cardiac dysfunction. Electrocardiogram (ECG) provides vital information about the heartbeats. In this work, we propose a method for classifying five groups of heartbeats recommended by AAMI standard EC57:1998. Considering the nature of ECG signal, we employed a non-stationary and nonlinear decomposition technique termed as improved complete ensemble empirical mode decomposition (ICEEMD). Later, higher order statistics and sample entropy measures are computed from the intrinsic mode functions (IMFs) obtained from ICEEMD on each ECG segment. Furthermore, three data level pre-processing techniques are performed on the extracted feature set, to balance the distribution of heartbeat classes. Finally, these features fed to AdaBoost ensemble classifier for discriminating the heartbeats. Simulation results show that the proposed method provides a better solution to the class imbalance problem in heartbeat classification.

Introduction

Cardiovascular diseases (CVDs) are one of the primary causes of the global increase in the fatality rate. According to [1], 30% of the global mortality is due to CVDs. They are notably higher in countries with relatively low-income levels. Approximately half of the cardiovascular deaths are sudden cardiac deaths (SCDs), and cardiac arrhythmias cause most of these. Arrhythmia refers to any disturbance that alters the normal rhythmic functioning of the heart. The chance for SCD is higher in patients having a history of stroke or patients at cardiovascular risk [2]. Therefore, continuous monitoring of heart activity is becoming alarmingly inevitable. Moreover, detection of arrhythmia is essential for proper therapy, to resist the deterioration in heart functioning.

Electrocardiogram (ECG) is an inexpensive and noninvasive diagnostic tool used to study the electrical activity of the heart. An ECG is an electrical signal representing the action potentials of various cardiac tissues, derived from the electrodes placed on different parts of the body [3]. A portable ECG recorder called Holter monitor is a very useful tool to analyze the electrical activity of the heart for longer durations. However, investigating various abnormal rhythmic changes from the long ECG record is very exhausting, even for an expertized clinician. Hence computer-aided diagnosis plays a vital role in arrhythmia identification, owing to its effectiveness and robustness. Arrhythmia detection follows the identification of successive heartbeat classes in the given ECG. Therefore, an important step in recognizing arrhythmia is heartbeat classification.

Numerous algorithms have been developed for computer-aided heartbeat classification in the last two decades [[4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33]]. Feature extraction and classification are the important stages in heartbeat characterization which are widely explored in literature. Features can be extracted directly from the morphology of the ECG signal (time domain methods) or after applying a transformation. R–R intervals, amplitude and duration of the QRS complex are the features which gained major attention in the literature [[6], [7], [8], [9], [10], [11], [12], [13]]. However, these features are sensitive to morphology and dynamics of the ECG. Then, transformation techniques appeared as a solution. The advantage with transformation based feature extraction is it avoids the calculation of fiducial points for heartbeats. This approach will alleviate the problem of exact time alignment of ECG waveforms which is a merit of computer-aided diagnosis [34]. In this category spectral coefficients [[14], [15]], subband coefficients [16] based features are well explored. However, lack of sufficient variations in the spectral features in accordance with the pathological conditions posed a limitation [34].

The nonlinear, non-stationary behavior of the heartbeats is the major reason behind the difficulties faced by the time-domain as well as frequency-domain approaches [[34], [35], [36]]. The non-stationarities arise due to the irregularities in the electrical-conduction formation and traverse. They result in inter-beat inconsistent rhythms and waveform changes in the ECG. Especially these non-stationarities are more in abnormal cardiac cycles. On the other hand, ECG signals are an outcome of nonlinear biological systems. Hence, time-frequency methods gained considerable interest in heartbeat discrimination.

Short-time Fourier transform (STFT) and Wigner–Ville distribution (WVD) are used in [17] for discriminating shockable rhythms from non-shockable cardiac rhythms. Wavelet-based feature extraction schemes are explored extensively in time-frequency methods [[4], [5], [18], [19], [20], [21], [22], [23]] owing to their efficiency in analyzing non-stationary signals. An important difficulty encountered in wavelets is in selecting mother wavelet and fixing the level of decomposition. Therefore, in recent years adaptive nonlinear decomposition method called empirical mode decomposition (EMD) gathered much attention in heartbeat signal analysis. The main advantage of the EMD is, it decomposes the signal into the AM–FM oscillatory modes from the local characteristics of the signal without any presumptions. The modes derived from the EMD process are complete and partially orthogonal. It is the first adaptive and local type of approach in time-frequency analysis [37]. Some algorithms have exploited EMD for effective feature extraction and arrhythmia recognition [[24], [25], [26], [27], [28]].

In recent years, nonlinear features such as higher order cumulants, entropy measures, higher order spectra and Lempel–Ziv (LZ) complexity are often utilized for heartbeat classification [[29], [30], [31]]. The improved versions of EMD are also used for bio-signal analysis. In [32] EEMD based features are used for discriminating ECG heartbeat signals. We can observe that nonlinear and non-stationary decomposition along with nonlinear features play a crucial role in successful classification.

There lies an additional challenge in dealing with medical data termed as class imbalance. It is due to the limited availability of rare classes results in non-uniform distribution of abnormal data. Here, the results will be biased towards the majority class thus challenging supervised machine learning and increasing the misdiagnosis rate [38]. Works done so far, hardly explored heartbeat classification by considering this limitation.

In this paper, we considered heartbeat classification along with class imbalance. We performed simple statistical tests to understand the nature of ECG signals. From this, we noticed that ECG is a non-stationary and non-Gaussian signal stemming from nonlinear systems. Therefore, we employed a nonlinear feature extraction scheme to classify five groups of ECG heartbeats based on the recommendation of association for the advancement of medical instrumentation (AAMI). We used an adaptive nonlinear decomposition method called improved complete ensemble empirical mode decomposition (ICEEMD) [39]. It is a data-driven method used to analyze non-stationary signals originating from a nonlinear system. It decomposes the signals into approximately orthogonal individual oscillatory modes. Later, we calculated higher order statistics (HOS) and sample entropy measure from these modes to extract the concealed information from the ECG. Next, we performed some data level sampling techniques such as random sampling, synthetic minority oversampling technique (SMOTE) and distribution based balancing to understand the impact of class imbalance on the training level. This feature set is then subjected to an ensemble classifier called AdaBoost [40] which was less explored for heartbeat classification. Furthermore, we employed a feature selection scheme to reduce the number of irrelevant features without decreasing the classification performance.

Rest of the paper is ordered as follows: the ECG data set, experimental setup and theoretical background of the methodology are presented in Section 2. Section 3 discusses about the simulation results and compares with existing works. Future challenges are presented in Section 4. The conclusion of the work is presented in Section 5.

Section snippets

Materials and method

A typical pattern classification system includes data collection, pre-processing, feature extraction, feature selection, and classification. Fig. 1 illustrates the block diagram of the proposed methodology. In this section, the theoretical background of the employed techniques is discussed.

Results and discussion

This section contains a discussion on the simulated results of the proposed methodology illustrated in Fig. 1. Our focus is mainly on class imbalance and its impact on heartbeat classification. We perform an empirical study on, how various data level pre-processing algorithms along with an ensemble classifier are contributing to alleviate the class imbalance problem. The proposed method is evaluated on the MIT-BIH arrhythmia database. In computer-aided diagnosis, feature extraction is of

Future challenges

The present work demonstrates the ECG beat classification based on the AAMI recommendations. However, still, there are some open issues. One such issue is the intra-patient paradigm, where heartbeats of the same patient are likely to appear in both training and testing data. This situation may lead to biased results. Secondly, fixed beat length segmentation is not always preferable owing to the fact of fast and slow varying heart rhythms. The study of adaptive beat size segmentation is

Conclusion

Computer-aided heartbeat classification studied in this work. Class imbalance has a significant affect on heartbeat classification. In our work, we performed an empirical analysis on how to alleviate class imbalance in heartbeat classification. We have employed a nonlinear data adaptive decomposition method namely ICEEMD to extract features from ECG heartbeats. Later, HOS and sample entropy parameters are calculated from the selected modes obtained from ICEEMD. Next, three data level sampling

References (72)

  • A.R. Hassan et al.

    Automatic sleep scoring using statistical features in the EMD domain and ensemble methods

    Biocybern. Biomed. Eng.

    (2016)
  • V. Almenar et al.

    A new adaptive scheme for ECG enhancement

    Signal Process.

    (1999)
  • M.A. Colominas et al.

    Improved complete ensemble EMD: a suitable tool for biomedical signal processing

    Biomed. Signal Process. Control

    (2014)
  • D. Kwiatkowski et al.

    Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root?

    J. Econom.

    (1992)
  • R. Djemili et al.

    Application of empirical mode decomposition and artificial neural network for the classification of normal and epileptic EEG signals

    Biocybern. Biomed. Eng.

    (2016)
  • P. Radivojac et al.

    Classification and knowledge discovery in protein databases

    J. Biomed. Inf.

    (2004)
  • P. Bermejo et al.

    Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets

    Expert Syst. Appl.

    (2011)
  • N.C. Oza et al.

    Classifier ensembles: select real-world applications

    Inf. Fusion

    (2008)
  • J. Pohjalainen et al.

    Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits

    Comput. Speech Lang.

    (2015)
  • S. Mendis et al.

    Global Atlas on Cardiovascular Disease Prevention and Control

    (2011)
  • L. Sörnmo et al.
    (2005)
  • M.K. Das et al.

    ECG beats classification using mixture of features

    (2014)
  • R.G. Afkhami et al.

    Cardiac arrhythmia classification using statistical and mixture modeling features of ECG signals

    Pattern Recognit. Lett.

    (2016)
  • Y.H. Hu et al.

    A patient-adaptable ECG beat classifier using a mixture of experts approach

    IEEE Trans. Biomed. Eng.

    (1997)
  • G. de Lannoy et al.

    Feature relevance assessment in automatic inter-patient heart beat classification

    Biosignals

    (2010)
  • T. Mar et al.

    Optimization of ECG classification by means of feature selection

    IEEE Trans. Biomed. Eng.

    (2011)
  • P. de Chazal et al.

    A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features

    IEEE Trans. Biomed. Eng.

    (2006)
  • G. Doquire et al.

    Feature selection for interpatient supervised heart beat classification

    Comput. Intell. Neurosci.

    (2011)
  • P. De Chazal et al.

    Automatic classification of heartbeats using ECG morphology and heartbeat interval features

    IEEE Trans. Biomed. Eng.

    (2004)
  • C. Mead et al.

    Expanded frequency-domain ECG waveform processing: integration into a new version of ARGUS/2H

    Proceedings of Computers in Cardiology

    (1982)
  • K.-i. Minami et al.

    Real-time discrimination of ventricular tachyarrhythmia with Fourier-transform neural network

    IEEE Trans. Biomed. Eng.

    (1999)
  • V.X. Afonso et al.

    ECG beat detection using filter banks

    IEEE Trans. Biomed. Eng.

    (1999)
  • V.X. Afonso et al.

    Detecting ventricular fibrillation

    IEEE Eng. Med. Biol. Mag.

    (1995)
  • C. Ye et al.

    Heartbeat classification using morphological and dynamic features of ECG signals

    IEEE Trans. Biomed. Eng.

    (2012)
  • M. Thomas et al.

    Classification of cardiac arrhythmias based on dual tree complex wavelet transform

    2014 International Conference on Communications and Signal Processing (ICCSP)

    (2014)
  • M.O.A. Omar et al.

    Application of the empirical mode decomposition to ECG and HRV signals for congestive heart failure classification

    2011 1st Middle East Conference on Biomedical Engineering (MECBME)

    (2011)
  • Cited by (0)

    View full text