Introduction
Alzheimer’s disease (AD) is a clinical syndrome characterized by the progressive deterioration of the memory and cognitive functions, particularly in elderly people. The disease usually appears silently, and the process is slow and irreversible. According to the 2019 Alzheimer’s World Report [
1], there are more than 50 million people with AD. The figure may rise to 152 million by 2050.
In recent years, the attention paid to AD has been gradually increasing. So far, only five drugs have been approved by the Food and Drug Administration (FDA) for the treatment of AD [
2], and all of them can only delay the development of AD and alleviate symptoms, but not cure or even treat AD. Consequently, early diagnosis is important to delay the symptoms through medication. Typically, AD is divided into four stages, and the best time to diagnose the disease is during the early stages of mild cognitive impairment (MCI) and mild AD [
3‐
5].
Electroencephalography (EEG) is the non-invasive acquisition of signals corresponding to electrical activity in the brain using electrodes positioned directly on the scalp. Magnetoencephalography (MEG) is also a non-invasive technique which is used to acquire signals by recording the magnetic activity of the brain. Functional magnetic resonance imaging (fMRI) indirectly detects changes of the brain neuronal activity based on the linked alterations of cerebral blood flow as exhibited by the differentiated magnetic properties of the hemoglobin molecule between its oxygen saturated and desaturated states. The difference between AD patients and normal control subjects can be detected using these brain signals, each coming with different advantages and disadvantages. Machine learning methods related to the classification between AD patients and normal control subjects using EEG, MEG, and fMRI brain signals are listed in Table
1.
Table 1
Summary of papers using EEG/MEG/fMRI signals to design a classification system for AD/MCI detection
| Correlation, phase synchrony, and Granger causality measures | EEG | MCI and mild AD | 83% and 88%, respectively | 2012 |
| Hybrid feature selection | EEG | MCI and mild AD | 95% and 100%, respectively | 2015 |
| Complex network theory and TSK fuzzy system | EEG | AD | 97.3% | 2019 |
| Functional connectivity and effective connectivity analysis | MEG | AD | 86% | 2019 |
| Phase locking value, imaginary part, and correlation of the envelope | MEG | MCI | 75% | 2019 |
| High-order FC correlations | fMRI | MCI | 88.14% | 2016 |
| Hierarchical high-order functional connectivity networks | fMRI | MCI | 84.85% | 2017 |
| Strength and similarity guided GSR using LOFC and HOFC | fMRI | MCI | 88.5% | 2019 |
With the increasing use of deep learning techniques, many deep AD detection methods have recently emerged. Sarraf and Tofighi [
14] used LeNet-5, a convolutional neural network (CNN) architecture, to classify fMRI data from AD subjects and normal controls, with an accuracy on the testing dataset of 96.85%. They used 5-fold cross-validation on a dataset containing 28 AD subjects and 15 normal controls. Kim and Kim [
15] proposed a classifier based on deep neural networks using the relative power of EEG to fully exploit and recombine features through its own learning structure. Their dataset contained 10 MCI subjects and 10 normal controls, and leave-one-out cross-validation was used to evaluate the model’s performance. The accuracy obtained on the testing dataset was 59.4%. Duan et al. [
16] used EEG functional connectivity as the network input to train ResNet-18, achieving an accuracy of 93.42% and 98.5% on the MCI and mild AD datasets, respectively, where the former contained 22 MCI subjects and 38 normal controls, and the latter contained 17 mild AD subjects and 24 normal controls.
Among the aforementioned brain signals (EEG, MEG, and fMRI), EEG has the best temporal resolution. Nevertheless, since EEG signals are acquired from several locations on the scalp with electrodes, their spatial resolution is not as good as that of the measurements for the other two types of signals. Despite this, the spatial distribution of the signals can be optimized in the processing steps with the use of well-designed algorithms [
17‐
21]. Given that EEG signals are easier to acquire and is less expensive than other techniques, EEG-based methods for AD detection are currently more popular.
In studies based on EEG signals, deep learning methods are trained on small datasets, as electrophysiological signals are more difficult to acquire in AD patients. The learning capability of deep learning models partially relies on their large number of hyper-parameters. A high amount of samples is required to fit these hyper-parameters and avoid the over-fitting problem [
22,
23]. One way to deal with the issue is using data augmentation.
Data augmentation can be implemented by generating artificial data [
24,
25]. The strategy of decomposing and recombining the original EEG signals is one possible way to create new artificial data for data augmentation [
26‐
28]. EEG signals can be decomposed into different filter banks. In each filter bank, the frequency of the decomposed EEG signals is within a certain frequency band. All filter banks cover a wide range of frequencies. This strategy helps to achieve a better performance using deep-learning models in the enhancement of small-size datasets. Note that in studies where this particular data augmentation strategy has been implemented, the details about the models used are not entirely the same throughout, even though the same overall approach is being used. For instance, Zhao et al. [
26] proposed a method of random recombination of EEG signals in different filter banks, which are decomposed by the discrete cosine transform. This approach enhances the classification performance of one-dimension convolutional neural networks in the epileptic seizure focus detection task. Zhang et al. [
27] used the augmentation strategy to enhance the classification performance of motor imagery. Instead of decomposing signals with the discrete cosine transform, the empirical mode decomposition (EMD) technique was adopted [
29]. In the decomposition–recombination strategy, EMD has the advantage that the signals can be recovered by simply adding up the decomposed intrinsic mode funtions (IMFs). Besides the decomposition–recombination strategy, generative adversarial networks (GANs) also offer a solution to generate artificial signals [
30]. However, GANs require a large dataset to tune the parameters and fit the model. Since the goal of data augmentation in small Alzheimer’s datasets is to solve the problem of insufficient samples, it is not possible to use GANs to generate artificial data.
In this paper, we propose a decomposition and recombination model for data augmentation in a small Alzheimer’s data set, which is used to distinguish AD patients from normal controls. The decomposition and recombination approach consists of three steps. First, empirical multivariate mode decomposition (MEMD) is used to decompose EEG signals into IMFs. These IMFs are then randomly recombined within each of the two groups. Finally, in each group, the IMFs are added up to generate a new artificial trial. These artificial trials are used to extend the AD training dataset.
This work is organized as follows. "
Method" includes the description of the small Alzheimer’s datasets used, the scheme of the proposed decomposition and recombination approach, and the neural networks used for classification. "
Results" presents the experimental results, including the classification performance of the neural networks during the training process and the effects of data augmentation in the datasets. Then, these results are discussed in "
Discussion", together with the limitations associated with the method. Finally, the conclusions are presented in "
Conclusion".
Discussion
In this work, we proposed a decomposition and recombination system to enlarge the size of two AD datasets and explored the data augmentation performance on three different neural networks. This work is based on the following two assumptions:
1.
The AD dataset is a small dataset.
2.
Neural networks need a considerable amount of data to tune the parameters.
Most patients affected by AD are elderly people. In contrast to the EEG signal acquisition of healthy people, AD patients are easily exhausted, weak, or less willing in the process of acquiring EEG signals. Sometimes, the acquisition can even be interrupted for unexpected reasons such as the non-collaboration of the patients. Therefore, AD datasets are very valuable and are usually small in size. To protect the health of the patients and to facilitate data acquisition in experiments, a data augmentation method is needed to process small AD datasets.
When it comes to the second assumption, note that deep neural networks can accurately find the unknown relationship between the raw data and the corresponding labels because of their intrinsic nature and huge number of parameters. At the same time, these parameters can only be learned from the available data, but the higher the number of parameters, the higher the number of signals needed to train the model. Therefore, data augmentation on small AD datasets is again of great interest.
In addition to the decomposition and recombination strategy in data augmentation, generative adversarial networks (GANs) are also a universal solution for time series data augmentation. However, in these, both the generator and discriminator parameters require a certain amount of data to be tuned. For an AD dataset of limited size, this requirement on the amount of data is not met, and hence, GANs are not suitable in this case.
In the classification of mild AD, data augmentation has a positive effect on the training of ResNet. When the number of artificial trials increases, the average accuracy of ResNet increases from 72.38 to 77.62%, with a consistent performance. In the BrainNet CNN case, a positive outcome is also obtained in the classification performance when using data augmentation in the mild AD dataset. However, this effect is only positive for a small number of artificial trials in the MCI dataset; if the number of artificial trials increases above 30, the mean accuracy decreases. Finally, the EEGNet network is the one with the poorest results for the mild AD dataset, and artificial trials only have a moderate positive effect for the MCI dataset again when the number of artificial trials is small.
In Fig.
11, the confusion matrices before and after data augmentation are given. Both ResNet and BrainNet CNN obtain a consistent accuracy, sensitivity, and precision increase when 10 artificial trials per class are used. As expected, the improvement is more noticeable in the mild AD database, as the two classes (controls and patients) are more distant from each other when compared to the MCI case, in which the patients are closer to the control subjects.
Summarizing the above experiments, the proposed decomposition and recombination system helps the training of neural networks in small AD datasets, and it seems that just a factor of 2 is enough for that. Having more artificial data does not always provide a better result, as we have seen in our experiments. The effects of the data augmentation depend on two factors: (i) the type of neural networks and (ii) the data set. Determining the number of artificial trials is influenced by these two factors, and ascertaining how to obtain an optimal value requires further experiments.
One possible reason for why the proposed data augmentation method does not always improve the accuracy results is due to the different characteristics of the two datasets. In Fig.
9a, the accuracy of ResNet in the mild AD dataset converges as the number of training epochs increases, and the result is stable in the training, with a small variance around the mean accuracy. However, in Fig.
9d, the accuracy in the MCI dataset still fluctuates in a larger range, especially compared with the mild AD dataset. This means that the network is more difficult to fit for the MCI dataset or that perhaps the quality of the data is also worse in that case. Although data augmentation improves the accuracy in the MCI dataset very slightly when the number of artificial trials is small, it still helps to train the ResNet: when the accuracy converges, the number of training epochs needed after data augmentation is smaller than without data augmentation, as shown in Fig.
9d. Similar fluctuations can be observed for the BrainNet CNN network in both datasets (Fig.
9b, e). This could explain why data augmentation is not helping in this case.
The proposed decomposition and recombination system has its own limitations. No pre-processing was used to remove artifacts or noise in the databases used in the experiments. Since the proposed method recombines all existing information in the data to enlarge the size of the training data, it is possible that artifacts or noise may also be replicated, which would negatively affect the results. Another aspect that can play a role is the decomposition method used. Here, we combine SEMD and MEMD, but other EMD-based methods have been proposed in the literature. Each method has different properties which impact the frequency mixing effect (overlapping of IMFs) and hence may influence the quality of the artificial frames. Moreover, the number of required artificial trials is unknown, as has been shown, and should be further investigated. More experiments are also needed to determine the number of epochs in the training phase, as our results indicate that the use of artificial trials may help to reduce the number of epochs in training and thus control possible overfitting. All of these aspects are now under consideration, and we expect to propose more reliable methods in future works.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.