1 Introduction
-
Non-Stationary and Nonlinear Analysis—Electroencephalogram (EEG) signals exhibit non-stationary characteristics, implying variations in their properties with respect to time. Additionally, EEG signals are nonlinear (i.e., do not follow a straight-line relationship). Traditional methods of time-series analysis often assume stationarity and linearity, which can lead to inaccuracies when these conditions are not met. Fractal dimensions, however, can handle non-stationary and nonlinear time series, making them more suitable for EEG analysis [23].
-
Understanding Complexity—Parkinson’s disease induces perturbations in global brain activity, which are evidenced by changes in the variability, complexity, and unpredictability of EEG signals [24]. Fractal dimensions, by definition, measure the complexity or "roughness" of a time series. Hence, by analyzing EEG signals using fractal dimensions, researchers can quantify these changes in complexity, which can aid in analyzing Parkinson’s disease efficiently.
-
Time-domain Feature Extraction—Another advantage of fractal dimensions over many other methods is that they can directly extract features from the temporal domain of the EEG signal [25]. In contrast, methods such as partial directed coherence (PDC), Shannon entropy, wavelet transforms, and Fourier transforms can only derive accurate features from the EEG signal’s frequency domain.
-
Simplicity and Efficiency—Despite their ability to capture complex characteristics of EEG signals, fractal dimensions are mathematically straightforward and do not require heavy computational resources [26]. They can be computed directly from the raw EEG data without the need for complex transformations. This makes them efficient for analyzing large datasets or for use in systems where computational resources or processing time may be limited. This dynamic nature of fractal dimensions can be very effective when working with real-time data for use cases that require a quick turnaround time.
-
To assess the efficacy of various fractal dimensional methods in extracting features from the EEG signal’s temporal domain to discern Parkinson’s disease (PD) patients from healthy individuals.
-
To determine the optimal machine learning pipeline that is constituted by the best-performing segmentation technique, feature extraction method, and supervised learning algorithm for detecting PD in patients who are ON and OFF medication, with nearly equal performance.
-
To identify and visualize the parts of the brain that is impacted more than other parts by using explainable AI.
-
To investigate if fractal dimensions can be used to accurately understand the significance of the brain regions most affected by PD.
The rest of this paper is organized as follows. Section 2 comprehensively reviews the state-of-the-art Parkinson’s disease (PD) detection techniques. The overall methodology and solution design to construct an optimal pipeline by experimenting with various pre-processing, segmentation, feature extraction, and machine learning techniques are presented in Sect. 3. Section 4 unveils the study’s findings. These findings are then elaborated upon in Sect. 5. Finally, Sect. 6 derives inferences from the observations while throwing light upon the contribution of this study to the existing literature.RQ: How effectively can fractal dimensions, as feature extraction measures, discriminate Parkinson’s disease (PD) patients who are ON or OFF medication from healthy controls (HC)?
2 Related work
3 Materials and methods
IF EEG features are extracted using fractal dimensions (FD), which are generated by the sliding window technique that segments multichannel EEG signals along with overlapping and is given as input to the classifier, THEN, there exists an optimal FD and classifier combination that discriminates Parkinson’s disease (PD) patients who are ON and OFF medications from healthy controls with greater than 99% classification accuracy.
3.1 Dataset
3.2 Pre-processing
3.3 Feature extraction
3.3.1 Higuchi’s fractal dimension
3.3.2 Katz fractal dimension
3.3.3 Petrosian fractal dimension
3.3.4 Spectral entropy
3.3.5 Permutation entropy
3.4 Classification
3.4.1 Hyperparameter tuning
-
KNN -
-
leaf_size(minimum points in a given node) = 30
-
metric (for calculating the distance between points) = ‘euclidean’
-
n_neighbors (Number of neighbors to use) = 6
-
p (Power parameter for the Minkowski metric) = 2 (p=2 for Euclidean distance)
-
weights (The weight function to use in the prediction task) = ‘uniform’
-
-
XGBoost -
-
learning_rate or eta (Step size shrinkage to prevent overfitting) = 0.1
-
n_estimators (Number of boosting rounds) = 280
-
max_depth (Maximum depth of a tree) = 8
-
colsample_bytree (Proportion of features to randomly sample for building each tree) = 1
-
reg_alpha (L1 regularization) = 0.05
-
-
LGBM -
-
learning_rate (Step size shrinkage to prevent overfitting) = 0.1
-
n_estimators (Number of boosting rounds) = 260
-
max_depth (Maximum depth of a tree) = -1
-
min_child_samples (Minimum data instances needed in a child leaf) = 41
-
num_leaves(number of leaf nodes)= 70
-
reg_lambda (L1 regularization) = 3
-
bagging_fraction = 0.6
-
bagging_freq = 2
-
boosting_type = ‘gbdt’
-
-
Extra Trees -
-
criterion (Function for measuring the split quality) = ‘gini’
-
n_estimators (Number of trees to grow) = 150
-
max_depth (Maximum depth of a tree) = None
-
min_sample_split (Minimum number of samples required to split an internal node) = 30
-
min_samples_leaf (Minimum count of data points required for a leaf node of a tree) = 15
-
-
Random Forest -
-
criterion (Function for measuring the split quality) = ‘gini’
-
n_estimators (Number of trees in the forest) = 120
-
max_depth (Maximum depth of the trees) = None
-
min_sample_split (Minimum number of samples required to split an internal node) = 28
-
min_samples_leaf (Minimum count of data points required for a leaf node of a tree) = 10
-
-
Quadratic Discriminant Analysis -
-
reg_param (Regularization parameter, which introduces bias to the estimated covariance matrices to ensure they are well-conditioned) = 0.1
-
3.4.2 Explainable AI
3.5 Evaluation metrics
-
Precision: It attempts to answer the question, “Out of all the predictions made by the machine learning model, how many were actually correct?"As represented in Eq. 19, precision is inversely proportional to the number of false positives.$$\begin{aligned} \text {Precision} = \dfrac{\text {True Positives}}{\text {True Positives} + \text {False Positives}} \end{aligned}$$(19)
-
Recall: It attempts to answer the question, “Out of all the actual positive values, how many were correctly identified by the machine learning model?"As represented in Eq. 20, recall is inversely proportional to the number of false negatives.$$\begin{aligned} \text {Recall} = \dfrac{\text {True Positives}}{\text {True Positives} + \text {False Negatives}} \end{aligned}$$(20)
-
F1 score: It is the harmonic mean of precision and recall that indicate the model’s overall performance while incorporating the number of false negatives and false positives.$$\begin{aligned} F1 \text {score} = 2 * \dfrac{\text {Precision} * \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$(21)
-
Accuracy: It answers the question, “Out of all the predictions made by the machine learning model, how many were correct?"where TP=True Positive, TN=True Negatives, FP=False Positives, and FN=False Negatives.$$\begin{aligned} \text {Accuracy} = \dfrac{\text {TP} + \text {TN}}{\text {TP} + \text {FP} + \text {TN} + \text {FP}} * 100\% \end{aligned}$$(22)$$\begin{aligned} \text {Accuracy} = \dfrac{N_{\text {correct}}}{N_{\text {total}}} * 100\% \end{aligned}$$(23)
-
Area under the ROC curve (AUC): An ROC curve (receiver operating characteristic curve) is a curve representing the performance of a classifier with respect to two parameters: true-positive rate (TPR) which is synonymous to recall and false positive rate (FPR).$$\begin{aligned} \text {TPR}= & {} \dfrac{\text {TP}}{\text {TP} + \text {FN}} \end{aligned}$$(24)The ROC curve plots TPR on the y-axis and FPR on the x-axis at different thresholds. When the threshold is lowered, more elements are classified as positive. AUC quantifies the 2D area under the ROC curve, which represents the total performance measure for all potential classification thresholds.$$\begin{aligned} \text {FPR}= & {} \dfrac{\text {FP}}{\text {FP} + \text {TN}} \end{aligned}$$(25)
4 Results
4.1 Sliding windowing with 50% overlap
4.1.1 PD patients OFF medication
Classifier model | Feature extraction | Accuracy | AUC | Recall | Precision | F1 score |
---|---|---|---|---|---|---|
KNN | Higuchi | 94.45 | 98.38 | 93.61 | 95.65 | 94.61 |
Katz | 74.13 | 81.53 | 89.53 | 69.46 | 78.19 | |
Petrosian | 89.82 | 95.76 | 88.65 | 91.40 | 89.98 | |
Spec-ent | 86.20 | 92.95 | 90.78 | 84.15 | 87.31 | |
Perm-ent | 89.67 | 95.57 | 89.41 | 90.79 | 90.08 | |
QDA | Higuchi | 91.86 | 97.32 | 91.37 | 92.87 | 92.11 |
Katz | 78.10 | 86.29 | 83.39 | 76.51 | 79.78 | |
Petrosian | 85.68 | 93.63 | 83.86 | 87.93 | 85.81 | |
Spec-ent | 84.50 | 92.77 | 90.06 | 82.13 | 85.87 | |
Perm-ent | 84.94 | 93.15 | 82.98 | 87.69 | 85.23 | |
XGB | Higuchi | 93.00 | 98.13 | 92.41 | 94.07 | 93.22 |
Katz | 77.06 | 84.72 | 80.90 | 76.31 | 78.47 | |
Petrosian | 88.59 | 95.55 | 87.44 | 90.18 | 88.76 | |
Spec-ent | 87.10 | 94.68 | 88.75 | 86.88 | 87.78 | |
Perm-ent | 88.03 | 95.28 | 87.31 | 89.61 | 88.42 | |
LGB | Higuchi | 92.27 | 97.81 | 91.21 | 93.82 | 92.47 |
Katz | 77.29 | 85.20 | 81.38 | 76.39 | 78.77 | |
Petrosian | 88.24 | 95.22 | 86.80 | 90.08 | 88.39 | |
Spec-ent | 86.76 | 94.15 | 88.67 | 86.42 | 87.49 | |
Perm-ent | 87.70 | 95.04 | 86.12 | 90.01 | 88.00 | |
ET | Higuchi | 91.27 | 97.21 | 88.78 | 94.18 | 91.36 |
Katz | 75.79 | 83.24 | 83.06 | 73.65 | 78.04 | |
Petrosian | 87.76 | 94.51 | 84.94 | 90.79 | 87.74 | |
Spec-ent | 85.89 | 93.59 | 87.48 | 85.82 | 86.63 | |
Perm-ent | 87.12 | 94.31 | 84.73 | 90.10 | 87.32 | |
RF | Higuchi | 90.86 | 96.48 | 88.94 | 93.27 | 91.01 |
Katz | 74.76 | 85.26 | 80.09 | 73.56 | 76.66 | |
Petrosian | 86.29 | 93.59 | 83.78 | 89.06 | 86.29 | |
Spec-ent | 84.69 | 92.52 | 86.12 | 84.86 | 85.45 | |
Perm-ent | 85.81 | 93.35 | 84.13 | 88.29 | 86.13 |
4.1.2 PD patients ON medication
Classifier model | Feature extraction | Accuracy | AUC | Recall | Precision | F1 score |
---|---|---|---|---|---|---|
KNN | Higuchi | 96.46 | 98.91 | 96.23 | 97.28 | 96.75 |
Katz | 76.26 | 83.74 | 92.56 | 71.99 | 80.98 | |
Petrosian | 90.42 | 96.66 | 93.86 | 89.29 | 91.50 | |
Spec-ent | 88.42 | 95.20 | 90.83 | 88.26 | 89.49 | |
Perm-ent | 91.07 | 96.31 | 90.95 | 92.62 | 91.75 | |
QDA | Higuchi | 94.18 | 98.70 | 95.20 | 94.25 | 94.71 |
Katz | 79.43 | 86.68 | 84.83 | 79.04 | 81.82 | |
Petrosian | 87.34 | 94.98 | 88.48 | 88.46 | 88.46 | |
Spec-ent | 85.40 | 93.08 | 90.67 | 83.79 | 87.06 | |
Perm-ent | 87.29 | 94.98 | 88.01 | 88.75 | 88.35 | |
XGB | Higuchi | 95.26 | 99.01 | 96.08 | 95.35 | 95.70 |
Katz | 78.11 | 85.80 | 83.96 | 77.73 | 80.72 | |
Petrosian | 89.83 | 96.12 | 91.17 | 90.43 | 90.78 | |
Spec-ent | 88.42 | 95.20 | 90.83 | 88.26 | 89.49 | |
Perm-ent | 89.77 | 96.49 | 90.91 | 90.51 | 90.66 | |
LGB | Higuchi | 94.81 | 98.99 | 95.8 | 94.75 | 95.29 |
Katz | 78.91 | 86.66 | 84.79 | 78.36 | 81.44 | |
Petrosian | 88.84 | 96.00 | 90.30 | 89.50 | 89.87 | |
Spec-ent | 87.77 | 94.93 | 90.31 | 87.61 | 88.91 | |
Perm-ent | 89.79 | 96.34 | 91.62 | 89.96 | 90.75 | |
ET | Higuchi | 94.70 | 98.77 | 95.32 | 95.05 | 95.17 |
Katz | 76.63 | 84.64 | 87.02 | 74.45 | 80.24 | |
Petrosian | 88.64 | 95.73 | 89.63 | 89.69 | 89.64 | |
Spec-ent | 86.99 | 94.07 | 90.27 | 86.44 | 88.28 | |
Perm-ent | 88.53 | 95.66 | 89.68 | 89.43 | 89.52 | |
RF | Higuchi | 93.94 | 98.25 | 94.69 | 94.31 | 94.48 |
Katz | 76.59 | 84.74 | 84.36 | 75.62 | 79.73 | |
Petrosian | 88.31 | 95.22 | 89.39 | 89.35 | 89.35 | |
Spec-ent | 85.95 | 93.50 | 88.27 | 86.24 | 87.20 | |
Perm-ent | 88.14 | 95.43 | 88.96 | 89.35 | 89.12 |
4.2 Sliding windowing with 90% overlap
4.2.1 PD patients OFF medication
Classifier model | Feature extraction | Accuracy | AUC | Recall | Precision | F1 score |
---|---|---|---|---|---|---|
KNN | Higuchi | 99.45 | 99.96 | 99.11 | 99.89 | 99.50 |
Katz | 83.87 | 91.52 | 87.99 | 83.32 | 85.59 | |
Petrosian | 99.48 | 99.98 | 99.41 | 99.63 | 99.52 | |
Spec-ent | 97.19 | 99.52 | 98.39 | 96.54 | 97.45 | |
Perm-ent | 99.42 | 99.98 | 99.35 | 99.60 | 99.47 | |
QDA | Higuchi | 93.35 | 98.10 | 93.51 | 94.34 | 93.92 |
Katz | 80.93 | 89.16 | 86.11 | 80.28 | 83.09 | |
Petrosian | 87.87 | 95.17 | 87.56 | 89.90 | 88.71 | |
Spec-ent | 86.69 | 94.12 | 91.04 | 85.56 | 88.21 | |
Perm-ent | 87.31 | 94.80 | 86.84 | 89.73 | 88.25 | |
XGB | Higuchi | 97.90 | 99.77 | 97.72 | 98.45 | 98.08 |
Katz | 83.87 | 91.52 | 87.99 | 83.32 | 85.59 | |
Petrosian | 95.79 | 99.25 | 95.24 | 96.98 | 96.10 | |
Spec-ent | 93.79 | 98.45 | 94.63 | 94.06 | 94.34 | |
Perm-ent | 95.55 | 99.17 | 95.13 | 96.71 | 95.91 | |
LGB | Higuchi | 96.22 | 99.43 | 95.90 | 97.18 | 96.53 |
Katz | 82.88 | 90.64 | 87.39 | 82.27 | 84.75 | |
Petrosian | 93.42 | 98.35 | 92.38 | 95.40 | 93.86 | |
Spec-ent | 91.20 | 97.30 | 92.43 | 91.57 | 92.00 | |
Perm-ent | 93.14 | 98.20 | 92.45 | 94.93 | 93.67 | |
ET | Higuchi | 98.22 | 99.86 | 97.50 | 99.25 | 98.37 |
Katz | 81.87 | 90.27 | 88.34 | 80.33 | 84.14 | |
Petrosian | 97.38 | 99.74 | 96.43 | 98.74 | 97.57 | |
Spec-ent | 94.53 | 98.85 | 94.76 | 95.21 | 94.99 | |
Perm-ent | 97.38 | 99.71 | 96.54 | 98.66 | 97.59 | |
RF | Higuchi | 96.99 | 99.62 | 96.18 | 98.30 | 97.22 |
Katz | 81.63 | 89.51 | 86.52 | 81.03 | 83.68 | |
Petrosian | 95.98 | 99.33 | 94.88 | 97.68 | 96.26 | |
Spec-ent | 92.52 | 97.89 | 92.79 | 93.49 | 93.13 | |
Perm-ent | 95.33 | 99.12 | 94.23 | 97.18 | 95.68 |
4.2.2 PD patients ON medication
Classifier model | Feature extraction | Accuracy | AUC | Recall | Precision | F1 score |
---|---|---|---|---|---|---|
KNN | Higuchi | 99.65 | 99.95 | 99.59 | 99.77 | 99.68 |
Katz | 83.11 | 90.47 | 95.11 | 78.62 | 86.08 | |
Petrosian | 99.60 | 99.98 | 99.34 | 99.92 | 99.63 | |
Spec-ent | 97.02 | 99.55 | 98.41 | 96.24 | 97.31 | |
Perm-ent | 99.57 | 99.98 | 99.37 | 99.85 | 99.61 | |
QDA | Higuchi | 94.18 | 98.70 | 95.20 | 94.25 | 94.71 |
Katz | 82.46 | 90.44 | 86.64 | 82.35 | 84.44 | |
Petrosian | 89.62 | 96.47 | 89.76 | 91.17 | 90.46 | |
Spec-ent | 86.63 | 94.13 | 91.41 | 85.25 | 88.22 | |
Perm-ent | 88.83 | 95.99 | 89.00 | 90.48 | 89.73 | |
XGB | Higuchi | 95.26 | 99.01 | 96.08 | 95.35 | 95.70 |
Katz | 84.78 | 92.36 | 88.66 | 84.42 | 86.49 | |
Petrosian | 96.08 | 99.34 | 96.33 | 96.51 | 96.42 | |
Spec-ent | 94.08 | 98.65 | 95.35 | 93.94 | 94.64 | |
Perm-ent | 95.62 | 99.25 | 96.12 | 95.91 | 96.01 | |
LGB | Higuchi | 94.81 | 98.99 | 95.88 | 94.75 | 95.29 |
Katz | 84.01 | 91.76 | 86.60 | 83.36 | 85.89 | |
Petrosian | 93.86 | 98.61 | 94.25 | 94.53 | 94.39 | |
Spec-ent | 92.00 | 97.76 | 93.81 | 91.78 | 92.78 | |
Perm-ent | 93.75 | 98.55 | 94.36 | 94.26 | 94.31 | |
ET | Higuchi | 94.70 | 98.77 | 95.32 | 95.05 | 95.17 |
Katz | 83.21 | 91.20 | 89.54 | 81.66 | 85.41 | |
Petrosian | 97.84 | 99.76 | 97.75 | 98.31 | 98.03 | |
Spec-ent | 95.06 | 99.07 | 96.48 | 94.61 | 95.53 | |
Perm-ent | 97.72 | 99.72 | 97.67 | 98.16 | 97.91 | |
RF | Higuchi | 93.94 | 98.25 | 94.69 | 94.31 | 94.48 |
Katz | 83.25 | 90.86 | 87.25 | 83.11 | 85.13 | |
Petrosian | 96.59 | 99.45 | 96.55 | 97.22 | 96.88 | |
Spec-ent | 93.38 | 98.26 | 94.28 | 93..67 | 93.98 | |
Perm-ent | 95.82 | 99.25 | 95.94 | 96.42 | 96.17 |
5 Discussion
-
Segmentation: sliding windowing having \(90\%\) overlap.
-
Feature Extraction: Higuchi’s technique to compute the fractal dimension (FD) of every window.
-
Classifier: k-nearest neighbors (KNN) classifier
-
Accuracy yielded: \(99.65\pm 0.15\%\) for “ON Meds vs. HC” and \(99.45\pm 0.18\%\) for “OFF Meds vs. HC.”
-
Early Detection and Diagnosis: The high accuracy achieved by our model indicates the potential for it to serve as a non-invasive diagnostic tool. Early detection of Parkinson’s disease can enable timely medical intervention, potentially slowing the progression of the disease.
-
Personalized Treatment: The machine learning model can be further fine-tuned with more patient data, facilitating more personalized treatments based on the individual’s EEG patterns.
-
Continuous Patient Monitoring: Wearable EEG devices combined with our detection algorithm can be used for continuous monitoring, enabling real-time feedback to both patients and caregivers about the current neurological state. Fractal dimensions, due to their dynamic nature, are especially adept at capturing biomarkers in real-time data.
-
Telemedicine and Remote Monitoring: Especially pertinent in the present scenario, our approach can be integrated into telemedicine platforms, allowing remote monitoring of Parkinson’s patients and timely interventions based on EEG readings.
-
Research Tool: The model can serve as a tool in clinical research, providing insights into the neurological changes that accompany different stages of Parkinson’s disease or the impacts of experimental treatments. Furthermore, this model can be packaged into software or an electronic platform that clinicians can use as an automated tool that provides preliminary diagnosis or likelihood of a neurodegenerative condition.