Classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier
Introduction
Cardiovascular diseases (CVDs) are one of the primary causes of the global increase in the fatality rate. According to [1], 30% of the global mortality is due to CVDs. They are notably higher in countries with relatively low-income levels. Approximately half of the cardiovascular deaths are sudden cardiac deaths (SCDs), and cardiac arrhythmias cause most of these. Arrhythmia refers to any disturbance that alters the normal rhythmic functioning of the heart. The chance for SCD is higher in patients having a history of stroke or patients at cardiovascular risk [2]. Therefore, continuous monitoring of heart activity is becoming alarmingly inevitable. Moreover, detection of arrhythmia is essential for proper therapy, to resist the deterioration in heart functioning.
Electrocardiogram (ECG) is an inexpensive and noninvasive diagnostic tool used to study the electrical activity of the heart. An ECG is an electrical signal representing the action potentials of various cardiac tissues, derived from the electrodes placed on different parts of the body [3]. A portable ECG recorder called Holter monitor is a very useful tool to analyze the electrical activity of the heart for longer durations. However, investigating various abnormal rhythmic changes from the long ECG record is very exhausting, even for an expertized clinician. Hence computer-aided diagnosis plays a vital role in arrhythmia identification, owing to its effectiveness and robustness. Arrhythmia detection follows the identification of successive heartbeat classes in the given ECG. Therefore, an important step in recognizing arrhythmia is heartbeat classification.
Numerous algorithms have been developed for computer-aided heartbeat classification in the last two decades [[4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33]]. Feature extraction and classification are the important stages in heartbeat characterization which are widely explored in literature. Features can be extracted directly from the morphology of the ECG signal (time domain methods) or after applying a transformation. R–R intervals, amplitude and duration of the QRS complex are the features which gained major attention in the literature [[6], [7], [8], [9], [10], [11], [12], [13]]. However, these features are sensitive to morphology and dynamics of the ECG. Then, transformation techniques appeared as a solution. The advantage with transformation based feature extraction is it avoids the calculation of fiducial points for heartbeats. This approach will alleviate the problem of exact time alignment of ECG waveforms which is a merit of computer-aided diagnosis [34]. In this category spectral coefficients [[14], [15]], subband coefficients [16] based features are well explored. However, lack of sufficient variations in the spectral features in accordance with the pathological conditions posed a limitation [34].
The nonlinear, non-stationary behavior of the heartbeats is the major reason behind the difficulties faced by the time-domain as well as frequency-domain approaches [[34], [35], [36]]. The non-stationarities arise due to the irregularities in the electrical-conduction formation and traverse. They result in inter-beat inconsistent rhythms and waveform changes in the ECG. Especially these non-stationarities are more in abnormal cardiac cycles. On the other hand, ECG signals are an outcome of nonlinear biological systems. Hence, time-frequency methods gained considerable interest in heartbeat discrimination.
Short-time Fourier transform (STFT) and Wigner–Ville distribution (WVD) are used in [17] for discriminating shockable rhythms from non-shockable cardiac rhythms. Wavelet-based feature extraction schemes are explored extensively in time-frequency methods [[4], [5], [18], [19], [20], [21], [22], [23]] owing to their efficiency in analyzing non-stationary signals. An important difficulty encountered in wavelets is in selecting mother wavelet and fixing the level of decomposition. Therefore, in recent years adaptive nonlinear decomposition method called empirical mode decomposition (EMD) gathered much attention in heartbeat signal analysis. The main advantage of the EMD is, it decomposes the signal into the AM–FM oscillatory modes from the local characteristics of the signal without any presumptions. The modes derived from the EMD process are complete and partially orthogonal. It is the first adaptive and local type of approach in time-frequency analysis [37]. Some algorithms have exploited EMD for effective feature extraction and arrhythmia recognition [[24], [25], [26], [27], [28]].
In recent years, nonlinear features such as higher order cumulants, entropy measures, higher order spectra and Lempel–Ziv (LZ) complexity are often utilized for heartbeat classification [[29], [30], [31]]. The improved versions of EMD are also used for bio-signal analysis. In [32] EEMD based features are used for discriminating ECG heartbeat signals. We can observe that nonlinear and non-stationary decomposition along with nonlinear features play a crucial role in successful classification.
There lies an additional challenge in dealing with medical data termed as class imbalance. It is due to the limited availability of rare classes results in non-uniform distribution of abnormal data. Here, the results will be biased towards the majority class thus challenging supervised machine learning and increasing the misdiagnosis rate [38]. Works done so far, hardly explored heartbeat classification by considering this limitation.
In this paper, we considered heartbeat classification along with class imbalance. We performed simple statistical tests to understand the nature of ECG signals. From this, we noticed that ECG is a non-stationary and non-Gaussian signal stemming from nonlinear systems. Therefore, we employed a nonlinear feature extraction scheme to classify five groups of ECG heartbeats based on the recommendation of association for the advancement of medical instrumentation (AAMI). We used an adaptive nonlinear decomposition method called improved complete ensemble empirical mode decomposition (ICEEMD) [39]. It is a data-driven method used to analyze non-stationary signals originating from a nonlinear system. It decomposes the signals into approximately orthogonal individual oscillatory modes. Later, we calculated higher order statistics (HOS) and sample entropy measure from these modes to extract the concealed information from the ECG. Next, we performed some data level sampling techniques such as random sampling, synthetic minority oversampling technique (SMOTE) and distribution based balancing to understand the impact of class imbalance on the training level. This feature set is then subjected to an ensemble classifier called AdaBoost [40] which was less explored for heartbeat classification. Furthermore, we employed a feature selection scheme to reduce the number of irrelevant features without decreasing the classification performance.
Rest of the paper is ordered as follows: the ECG data set, experimental setup and theoretical background of the methodology are presented in Section 2. Section 3 discusses about the simulation results and compares with existing works. Future challenges are presented in Section 4. The conclusion of the work is presented in Section 5.
Section snippets
Materials and method
A typical pattern classification system includes data collection, pre-processing, feature extraction, feature selection, and classification. Fig. 1 illustrates the block diagram of the proposed methodology. In this section, the theoretical background of the employed techniques is discussed.
Results and discussion
This section contains a discussion on the simulated results of the proposed methodology illustrated in Fig. 1. Our focus is mainly on class imbalance and its impact on heartbeat classification. We perform an empirical study on, how various data level pre-processing algorithms along with an ensemble classifier are contributing to alleviate the class imbalance problem. The proposed method is evaluated on the MIT-BIH arrhythmia database. In computer-aided diagnosis, feature extraction is of
Future challenges
The present work demonstrates the ECG beat classification based on the AAMI recommendations. However, still, there are some open issues. One such issue is the intra-patient paradigm, where heartbeats of the same patient are likely to appear in both training and testing data. This situation may lead to biased results. Secondly, fixed beat length segmentation is not always preferable owing to the fact of fast and slow varying heart rhythms. The study of adaptive beat size segmentation is
Conclusion
Computer-aided heartbeat classification studied in this work. Class imbalance has a significant affect on heartbeat classification. In our work, we performed an empirical analysis on how to alleviate class imbalance in heartbeat classification. We have employed a nonlinear data adaptive decomposition method namely ICEEMD to extract features from ECG heartbeats. Later, HOS and sample entropy parameters are calculated from the selected modes obtained from ICEEMD. Next, three data level sampling
References (72)
Global public health problem of sudden cardiac death
J. Electrocardiol.
(2007)ECG beat classification using neuro-fuzzy network
Pattern Recognit. Lett.
(2004)- et al.
ECG beat classification using particle swarm optimization and radial basis function neural network
Expert Syst. Appl.
(2010) - et al.
Arrhythmia recognition and classification using combined linear and nonlinear features of ECG signals
Comput. Methods Progr. Biomed.
(2016) - et al.
Application of principal component analysis to ECG signals for automated diagnosis of cardiac health
Expert Syst. Appl.
(2012) - et al.
Feature extraction for ECG heartbeats using higher order statistics of WPD coefficients
Comput. Methods Progr. Biomed.
(2012) - et al.
A multi-stage automatic arrhythmia recognition and classification system
Comput. Biol. Med.
(2011) - et al.
Exploiting correlation of ECG with certain EMD functions for discrimination of ventricular fibrillation
Comput. Biol. Med.
(2011) - et al.
Electrocardiogram beat classification using empirical mode decomposition and multiclass directed acyclic graph support vector machine
Comput. Electr. Eng.
(2014) - et al.
Linear and nonlinear analysis of normal and CAD-affected heart rate signals
Comput. Methods Progr. Biomed.
(2014)