Introduction
-
Timely intervention and treatment: Detecting cognitive impairment in its early stages allows for prompt intervention and the implementation of appropriate treatments [4]. Certain cognitive disorders, such as Alzheimer’s disease, may benefit from early pharmacological or non-pharmacological interventions, potentially slowing down the progression of symptoms.
-
Improved quality of life: Early detection enables individuals to receive timely support and resources to cope with cognitive changes. This can enhance their overall quality of life by providing them with tools and strategies to manage cognitive challenges and maintain independence for as long as possible.
-
Reduced caregiver burden: Identifying cognitive impairment early allows caregivers to plan and adapt to the evolving needs of the individual. This proactive approach can reduce caregiver burden by facilitating better preparation, support, and the development of coping mechanisms.
-
Patient and family empowerment: Early detection empowers individuals and their families with knowledge about the condition. It allows for informed decision-making regarding future care plans, legal matters, and financial arrangements, promoting a sense of control and autonomy.
-
Facilitation of research: Early detection contributes valuable data for research purposes, aiding scientists and healthcare professionals in understanding the progression of cognitive disorders. This, in turn, can lead to the development of more effective treatments and interventions.
State of the Art
Cognitive and Biologically Inspired Approach
Methodology
Pentagon Copying Test
Score | Condition |
---|---|
10: normal | All sides were equal, all the angles of the figures were present, and the two figures intersected |
9 | One or two sides are of different length |
8 | Same as score 9 but no intersection |
7 | Loss of one or more angles |
6 | One pentagon incomplete |
5 | Reduced number of sides |
4 | Loss of sides and angles |
3 | Grossly incomplete sides |
2 | Not interpretable |
1 | No reasonable attempt at drawing or the drawing was just a squiggle or scrawl |
Database
-
Drawings: (a) two pentagon copy test, (b) house copy test, (c) spring drawing, (d) Archimedes spiral, (e) concentric circles performed at regular speed, (f) straight line connecting two dots without touching the lower and upper black bars. Figure 4 shows the template used for this tasks
-
Handwriting: (1) signature performed two times, (2) words in capital letters copy, (3) cursive letter sentence copy. For more information on different handwritten tasks check [10].
-
The PDT is successful if there are two pentagons that intersect at two points.
-
Each pentagon must have exactly five sides and five angles and must interlock at two points of contact.
-
It does not matter if the angles are not equal, although it is necessary that the pentagons are not open at any corner.
-
Small errors are allowed when almost imperceptible, and also if tremors are evident, and the lines are not completely straight.
Shot Boundary Detection with Background Subtraction
Video Dataset Creation with Shot Boundary Detection
Shot Boundary Transformer Detection
Experimental Verification
-
Lower motion threshold (-min-percent): set at 1.0%. This lower threshold allows for the detection of scene changes even with minimal movements, ensuring that no significant details in the patients’ actions are overlooked.
-
Upper motion threshold (-max-percent): set at 10.0%. This limit ensures that excessively large or sudden movements are not mistakenly interpreted as scene changes, maintaining focus on the patients’ relevant actions.
-
Warm-up period (-warmup): established at 200 frames. This initial phase allows the system to adapt to the video context before starting the actual detection, enhancing accuracy in distinguishing key actions of the patients.
Performance of the Proposed System
Processed Output Evaluation
Input | Start time | Label | Probability |
---|---|---|---|
1 | 00:00:02 | PENTAGON | 95.01% |
1 | 00:00:11 | PENTAGON_ERRONEOUS | 98.28% |
1 | 00:00:16 | PENTAGON_ERRONEOUS | 98.47% |
1 | 00:00:24 | None | 35.30% |
1 | 00:00:26 | PENTAGON_ERRORNEOUS | 99.00% |
2 | 00:00:32 | PENTAGON | 95.02% |
2 | 00:00:38 | HOUSE_ERRONEOUS | 98.60% |
2 | 00:00:46 | HOUSE_ERRONEOUS | 98.76% |
2 | 00:00:52 | HOUSE_ERRONEOUS | 98.35% |
2 | 00:00:54 | HOUSE | 94.06% |
Multiclass Results
Layer | Output shape | Param |
---|---|---|
Input_12 (input layer) | (MAX_SEQ_LENGTH, NUM_FEATURES) | 0 |
Positional embedding | (MAX_SEQ_LENGTH, NUM_FEATURES) | 61,720 |
Transformer_layer encoder | (MAX_SEQ_LENGTH, NUM_FEATURES) | 4,211,716 |
GlobalMaxPooling1d_5 | (NUM_FEATURES) | 0 |
Dropout_5 | (NUM_FEATURES) | 0 |
Dense_17 | (NUM_CLASSES) | 12,300 |
Hyperparameter | Value | Description |
---|---|---|
MAX_SEQ_LENGTH | 60 | Maximum length of the input sequence that the model can process |
NUM_FEATURES | 1024 | Number of features per time step or spatial area |
IMG_SIZE | 128 | Dimension of the input images in pixels (128 × 128) |
EPOCHS | 120 | Total number of complete training cycles on the training data |
NUM_CLASSES | 12 | Total number of complete training cycles on the training data |
Evaluation of the Proposed Framework for Multiclass Problem
Classes | Precision | Recall | F1-score | Support |
---|---|---|---|---|
20 epochs | ||||
CIRCLE | 0.86 | 0.86 | 0.86 | 21 |
CIRCLE_ERRONEOUS | 0.30 | 0.73 | 0.43 | 22 |
HOUSE | 0.74 | 0.51 | 0.61 | 45 |
HOUSE_ERRONEOUS | 0.74 | 0.76 | 0.75 | 67 |
LINE | 0.93 | 0.56 | 0.70 | 25 |
LINE_ERRONEOUS | 0.94 | 0.59 | 0.73 | 27 |
PENTAGON | 0.77 | 0.79 | 0.78 | 34 |
PENTAGON_ERRONEOUS | 0.51 | 0.74 | 0.60 | 34 |
SPIRAL | 0.77 | 0.74 | 0.75 | 27 |
SPIRAL_ERRONEOUS | 0.51 | 0.67 | 0.58 | 27 |
VERTICAL_SPIRAL | 0.93 | 0.67 | 0.78 | 21 |
VERTICAL_SPIRAL_ERRONEOUS | 0.89 | 0.32 | 0.47 | 25 |
50 epochs | ||||
CIRCLE | 1.00 | 0.81 | 0.89 | 21 |
CIRCLE_ERRONEOUS | 0.48 | 0.73 | 0.58 | 22 |
HOUSE | 0.41 | 0.78 | 0.53 | 45 |
HOUSE_ERRONEOUS | 0.65 | 0.48 | 0.55 | 67 |
LINE | 1.00 | 0.72 | 0.84 | 25 |
LINE_ERRONEOUS | 0.89 | 0.59 | 0.71 | 27 |
PENTAGON | 0.82 | 0.68 | 0.74 | 34 |
PENTAGON_ERRONEOUS | 0.67 | 0.53 | 0.59 | 34 |
SPIRAL | 0.75 | 0.78 | 0.76 | 27 |
SPIRAL_ERRONEOUS | 0.46 | 0.63 | 0.53 | 27 |
VERTICAL_SPIRAL | 1.00 | 0.81 | 0.89 | 21 |
VERTICAL_SPIRAL_ERRONEOUS | 0.88 | 0.60 | 0.71 | 25 |
100 epochs | ||||
CIRCLE | 1.00 | 0.71 | 0.83 | 21 |
CIRCLE_ERRONEOUS | 0.42 | 0.64 | 0.51 | 22 |
HOUSE | 0.43 | 0.51 | 0.47 | 45 |
HOUSE_ERRONEOUS | 0.58 | 0.73 | 0.64 | 67 |
LINE | 0.00 | 0.00 | 0.00 | 25 |
LINE_ERRONEOUS | 0.50 | 0.63 | 0.56 | 27 |
PENTAGON | 0.82 | 0.68 | 0.74 | 34 |
PENTAGON_ERRONEOUS | 0.39 | 0.71 | 0.50 | 34 |
SPIRAL | 0.89 | 0.30 | 0.44 | 27 |
SPIRAL_ERRONEOUS | 0.46 | 0.67 | 0.55 | 27 |
VERTICAL_SPIRAL | 1.00 | 0.33 | 0.50 | 21 |
VERTICAL_SPIRAL_ERRONEOUS | 0.70 | 0.28 | 0.40 | 25 |
120 epochs | ||||
CIRCLE | 0.95 | 0.90 | 0.93 | 21 |
CIRCLE_ERRONEOUS | 0.57 | 0.55 | 0.56 | 22 |
HOUSE | 0.52 | 0.62 | 0.57 | 45 |
HOUSE_ERRONEOUS | 0.67 | 0.72 | 0.69 | 67 |
LINE | 1.00 | 0.80 | 0.89 | 25 |
LINE_ERRONEOUS | 0.84 | 0.59 | 0.70 | 27 |
PENTAGON | 0.71 | 0.59 | 0.65 | 34 |
PENTAGON_ERRONEOUS | 0.59 | 0.56 | 0.58 | 34 |
SPIRAL | 0.87 | 0.74 | 0.80 | 27 |
SPIRAL_ERRONEOUS | 0.51 | 0.78 | 0.62 | 27 |
VERTICAL_SPIRAL | 0.90 | 0.86 | 0.88 | 21 |
VERTICAL_SPIRAL_ERRONEOUS | 0.80 | 0.80 | 0.80 | 25 |
150 epochs | ||||
CIRCLE | 0.90 | 0.86 | 0.88 | 21 |
CIRCLE_ERRONEOUS | 0.71 | 0.45 | 0.56 | 22 |
HOUSE | 0.68 | 0.67 | 0.67 | 45 |
HOUSE_ERRONEOUS | 0.76 | 0.75 | 0.75 | 67 |
LINE | 0.96 | 0.88 | 0.92 | 25 |
LINE_ERRONEOUS | 0.92 | 0.85 | 0.88 | 27 |
PENTAGON | 0.69 | 0.91 | 0.78 | 34 |
PENTAGON_ERRONEOUS | 0.67 | 0.65 | 0.66 | 34 |
SPIRAL | 0.88 | 0.78 | 0.82 | 27 |
SPIRAL_ERRONEOUS | 0.55 | 0.85 | 0.67 | 27 |
VERTICAL_SPIRAL | 0.95 | 0.86 | 0.90 | 21 |
VERTICAL_SPIRAL_ERRONEOUS | 0.95 | 0.76 | 0.84 | 25 |
-
There is a discernible trend of precision improvement for many classes as epochs increase, especially noted in “LINE”, “LINE_ERRONEOUS”, “VERTICAL_SPIRAL”, and “VERTICAL_SPIRAL_ERRONEOUS”, suggesting enhanced prediction confidence with extended training.
-
Anomaly at 100 epochs, particularly for “SPIRAL” and “VERTICAL_SPIRAL”, indicates potential overfitting or ineffective learning past a certain training threshold.
-
By 150 epochs, a majority of the classes exhibit elevated F1-scores compared to earlier epochs, indicative of a balanced enhancement in both precision and recall.
-
The model appears to strike an optimal balance between precision and recall at 150 epochs, as reflected by high F1-scores. For example, the “LINE” and “LINE_ERRONEOUS” classes achieve impressive precision and recall, culminating in high F1-scores, which means the model is both precise and reliable for these classifications.
-
The “PENTAGON” class demonstrates a notable rise in recall between 120 and 150 epochs, suggesting improved capability in identifying all relevant instances at the latter epoch.
-
It is noteworthy, however, that some classes deviate from this trend, like “CIRCLE_ERRONEOUS”, which sees a precision dip from 120 to 150 epochs, potentially due to the model’s conservative bias or missing true positives in its quest to minimize false positives.
-
“HOUSE_ERRONEOUS” class maintains consistently high and stable scores across epochs, indicating reliable erroneous drawing realization detection for this category.
-
“LINE” class showcases 100% precision at 50 and 120 epochs, representing an ideal scenario, yet its recall is less than perfect, suggesting that while the model’s predictions are highly accurate, it doesn‘t consistently detect all instances of “LINE”.
-
Support refers to the number of occurrences of each class in the dataset. In a classification problem, each class has a corresponding support value, indicating how many instances of that class exist in the dataset. Support is particularly relevant in multi-class classification scenarios, helping to understand the distribution of classes and their representation in the dataset. The “support” figures remain unchanged, ensuring a consistent dataset size for each class and a fair comparison across epochs.
-
The zero precision and recall for the “LINE” class at 100 epochs raise concerns, possibly pointing to a training anomaly or indicating a total detection failure at this stage.
Epochs | Accuracy | Support |
---|---|---|
20 | 67% | 375 |
50 | 65% | 375 |
100 | 55% | 375 |
120 | 70% | 375 |
150 | 77% | 375 |
Binary Classification Results
-
Pentagon (170 videos)
-
House (225)
-
Vertical Spiral (105)
-
Line (135)
-
Spiral (127)
-
Circle (107)
-
Pentagon erroneous (170)
-
House erroneous (336)
-
Vertical spiral erroneous (124)
-
Line erroneous (129)
-
Spiral erroneous (135)
-
Circle erroneous (112)
Layer | Output shape | Param |
---|---|---|
Input_12 (input layer) | (MAX_SEQ_LENGTH, NUM_FEATURES) | 0 |
Positional embedding | (MAX_SEQ_LENGTH, NUM_FEATURES) | 61,440 |
Transformer_layer encoder | (MAX_SEQ_LENGTH, NUM_FEATURES) | 4,211,716 |
GlobalMaxPooling1d_5 | (NUM_FEATURES) | 0 |
Dropout_5 | (NUM_FEATURES) | 0 |
Dense_17 | (NUM_CLASSES) | 2050 |
Hyperparameter | Value | Description |
---|---|---|
MAX_SEQ_LENGTH | 60 | Maximum length of the input sequence that the model can process |
NUM_FEATURES | 1024 | Number of features per time step or spatial area |
IMG_SIZE | 128 | Dimension of the input images in pixels (128 × 128) |
EPOCHS | 120 | Total number of complete training cycles on the training data |
NUM_CLASSES | 12 | Total number of complete training cycles on the training data |
Evaluation of the Proposed Framework for Binary Classification
Classes | Precision | Recall | F1-score | Support |
---|---|---|---|---|
150 epochs | ||||
HEALTHY | 0.78 | 0.74 | 0.76 | 150 |
NOT_HEALTHY | 0.78 | 0.82 | 0.80 | 171 |
200 epochs | ||||
HEALTHY | 0.84 | 0.77 | 0.80 | 150 |
NOT_HEALTHY | 0.81 | 0.87 | 0.84 | 171 |
500 epochs | ||||
HEALTHY | 0.86 | 0.75 | 0.80 | 150 |
NOT_HEALTHY | 0.80 | 0.89 | 0.84 | 171 |
800 epochs | ||||
HEALTHY | 0.70 | 0.72 | 0.71 | 150 |
NOT_HEALTHY | 0.75 | 0.73 | 0.74 | 171 |
-
Area under the curve (AUC): The AUC is 0.84, which denotes a robust discriminative capacity of the model. Generally, an AUC ranging from 0.8 to 0.9 is considered very good, suggesting that the model is well-calibrated and can distinguish between “healthy” and “not healthy” subjects with a high degree of probability.
-
Trade-off between TPR and FPR: The curve illustrates that as the true positive rate increases (more “healthy” subjects correctly identified), the false positive rate (more “not healthy” subjects incorrectly identified as “healthy”) also increases, which is typical in binary classification settings. The goal is to maximize the TPR while minimizing the FPR.
-
Model performance: For the most part, the curve lies well above the dashed diagonal line that represents random classification (AUC = 0.5). This indicates that the model’s performance is significantly better than chance.
-
Optimal threshold: The specific point along the curve considered “optimal” will depend on the application context and the desired balance between TPR and FPR. For instance, in some medical applications, it might be preferable to minimize false negatives (thereby maximizing sensitivity), even at the expense of accepting more false positives.
-
True negatives (TN): The model has correctly identified 115 cases as “healthy”. This indicates that the model is fairly reliable in recognizing the absence of “non-healthy” conditions.
-
False positives (FP): There were 35 instances where the model incorrectly classified “healthy” cases as “non-healthy”. Such errors can be problematic in medical contexts, as they may lead to unnecessary further testing or treatments.
-
False negatives (FN): The model failed to identify 22 cases of “non-healthy”, erroneously classifying them as “healthy”. These errors are often considered more serious in clinical contexts, as they imply a missed treatment of a condition that requires attention.
-
True positives (TP): With 149 cases correctly identified as “non-healthy”, the model demonstrates a good ability to detect conditions when they are indeed present.
Epochs | Accuracy | Support |
---|---|---|
150 | 78% | 321 |
200 | 82% | 321 |
500 | 83% | 321 |
800 | 72% | 321 |
Discussion and Conclusions
Reference | Methodology | Accuracy |
---|---|---|
Kruthika et al. [30] | Gaussian Naive Bayes | 93% |
Kruthika et al. [30] | K-nearest neighbor (KNN) | 93% |
Kruthika et al. [30] | Support vector machine (SVM) | 93% |
Kruthika et al. [30] | SVM + KNN + PSO | 93% |
Liu et al. [31] | Cascaded CNNs for AD diagnosis | 93% |
Payan et al. [32] | DNN with sparse AE and CNN | 89% |
Sarraf et al. [33] | CNN MRI | 98% |
Sarraf et al. [33] | CNN fMRI | 99% |
Erdas et al. [29] | 3DCNN | 96% |
Erdas et al. [29] | ConvLSTM | 95% |
This work | ShotBoundary Transformer | 83% |
-
Existing literature relies on more sophisticated signals such as human gait, MRI, fMRI, and PET. These signals require an expensive hardware, while our system relies on a simple and cheap frontal camera mounted on the head.
-
Experimental results of existing literature is based on homogeneous databases, where there is a single pathology and the ground truth was obtained after a medical diagnose. In our case, we probably have a set of diverse pathologies; medical diagnose is not available, and ground truth is derived on a single test, which is the pentagon copying test.
-
We did not systematically record the medications being taken by participants, and it is recognized that pharmaceutical interventions can influence cognitive performance and may have introduced variability into our results. Additionally, the absence of recorded information pertaining to participants’ previous medical history, such as a history of stroke, introduces a potential confounding factor that was not accounted for in our analysis.
-
A subset of individuals within our study exhibited difficulties with writing, potentially attributable to a deficit in their educational background.
-
The use of tablets for tasks involving handwriting may be unfamiliar to elderly individuals, potentially influencing their performance on such devices.