Introduction

Alzheimer's disease (AD) and frontotemporal dementia (FTD) are major diseases underlying dementia, especially in younger patients (age < 65 years) [1]. Establishing an accurate diagnosis in the early stage of the disease can be difficult. Although clinical symptomatology differs between the diseases, symptoms in the early stage may be unclear and can overlap [2, 3]. The current clinical criteria, which entail qualitative inspection of neuroimaging, fail to accurately differentiate AD from FTD [4]. However, early and accurate differential diagnosis of AD and FTD is very important, mainly because it gives patients access to supportive therapies [5, 6]. In addition, early diagnosis supports new research into understanding the disease process and developing new treatments [5, 6].

In this difficult case of differential diagnosis between AD and FTD, methods for computer-aided diagnosis may be beneficial. These methods make use of multivariate data analysis techniques that train a model (classifier) based on neuroimaging or related data, resulting in an objective diagnosis. In addition, computer-aided diagnosis can be more accurate than using only clinical criteria [7], as it potentially makes use of subtle group differences. Using structural T1-weighted (T1w) MRI to find characteristic patterns of brain atrophy, computer-aided diagnosis methods yielded accuracy of up to 84% for differentiation of AD and FTD [810].

Besides using structural MRI, evidence of neurodegeneration can be measured with advanced MRI techniques such as arterial spin labelling (ASL) and diffusion tensor imaging (DTI). ASL can non-invasively measure brain perfusion in terms of cerebral blood flow (CBF) [11, 12]. Recent studies have shown differences in perfusion patterns for FTD and AD indicating that this technique is promising for differential diagnosis [1316]. In addition, some classification studies showed an added value of ASL over atrophy measurements for AD diagnosis in individual patients, although others did not [13, 1719]. Using DTI, the fractional anisotropy (FA) can be quantified, which is related to the degradation of white matter (WM) bundles. WM degradation has been shown to be more prominent in FTD than in AD, especially in frontal brain regions [14, 20, 21]. In classification studies, DTI generally shows a slight added value to atrophy measurements [2228].

As ASL and DTI measure aspects of the neurodegenerative process that are different from brain volume changes, we hypothesise that these techniques have an added diagnostic value over structural MRI. Although ASL and DTI have been shown to be potential markers for differential diagnosis of AD and FTD, their combined added value for computer-aided differential diagnosis has not yet been evaluated. This study aims to investigate the added diagnostic value of ASL and DTI to structural MRI for classification of AD, FTD, and controls.

Materials and methods

Participants

We retrospectively included 24 AD patients, 33 FTD patients, and 34 cognitively normal (CN) controls. Patients who visited the memory clinic of our institution between February 2011 and June 2015 were considered for inclusion. Patients underwent neurological and neuropsychological examination as part of their diagnostic work-up. Patients with a Mini-Mental State Examination (MMSE) score ≥ 20 were included if they had undergone MR imaging with a standardised protocol including structural T1w MRI, ASL, and DTI. Patients with psychiatric or neurological disorders other than dementia were excluded. The reference standard was a diagnosis of AD or FTD established by consensus of a multidisciplinary team according to the clinical criteria [2, 3, 29]. Controls were recruited from patient peers and through advertisement, and had no memory complaints, history of neurological or psychiatric disease, or contra-indications for MRI.

This study was approved by the local medical ethics committee. Eighty-seven participants signed informed consent; consent from the remaining four patients was waived because of the retrospective nature of the study.

Image acquisition and processing

MR imaging was performed at 3 T with 8-channel head coils on two identical scanners (Discovery MR750; GE Healthcare, Milwaukee, WI, USA). The protocol included T1w, ASL, and DTI. High-resolution isotropic T1w images were acquired with 3D inversion recovery fast spoiled gradient-recalled echo. According to the recommendations for ASL [12], we acquired 3D pseudo-continuous ASL perfusion-weighted images and a separate proton-density image for scaling. DTI used 2D single-shot echo planar imaging in 25 non-collinear directions [30]. Detailed parameters are listed in Table 1.

Table 1 MRI acquisition parameters

For image processing, the Iris pipeline [19] was applied to obtain voxel-based measures of structural MRI, ASL, and DTI (see Appendix A for a detailed description). From structural MRI, we derived tissue segmentations—WM, grey matter (GM), cerebrospinal fluid—and a brain mask. In a group template space, we derived features based on voxel-based morphometry (VBM) within a mask of the 1) GM (VBM-GM), 2) WM (VBM-WM) and 3) supratentorial brain (VBM-Brain). For ASL, CBF was quantified using a single-compartment model and partial volume correction. The CBF voxel values of the GM in the template space were used as features for classification. For DTI, tensor fits were performed to derive FA maps. The FA voxel values in WM in the template space were used as features for classification.

Quality control

The following images were visually inspected (E.E.B., 5 years of experience): GM segmentation, WM segmentation, brain mask, template space registration, ASL registered to structural MRI, CBF map, DTI registered to structural MRI, and FA map. Any errors in the image processing were corrected until visual inspection revealed no more unacceptable results.

Analysis and statistics

Classifications of AD versus CN (AD-CN), FTD-CN, and AD-FTD were performed with linear support-vector-machine (SVM) classifiers [31]. The SVM C-parameter was optimised in cross-validation on the training set. Classifiers were trained on VBM-GM, VBM-WM, VBM-Brain, CBF, and FA features separately. For combination of multiple parameters, the classifiers were combined by averaging posterior probabilities [32]. The following multi-parametric classifiers were trained:

  • GM combination: VBM-GM and CBF

  • WM combination: VBM-WM and FA

  • Full combination: VBM-Brain, CBF, and FA

For multi-class classification (AD-FTD-CN), pairwise classifiers were combined by multiplying the posterior probabilities. Using fourfold cross-validation, the mean area under the receiver operating characteristic curve (AUC), the mean accuracy, and standard deviations over 50 iterations were computed. The multi-class AUC was evaluated over pairs of classes [33], and the multi-class accuracy equalled the correctly classified rate.

Differences in mean AUC and accuracy were tested: 1) CBF versus VBM-GM, 2) FA versus VBM-WM, 3) GM combination versus VBM-GM, 4) WM combination versus VBM-WM, 5) Full combination versus VBM-Brain. This was done using non-parametric permutation tests: the difference in performance of the two classifications was compared (α ≤ 0.05) to a null distribution that was estimated using 500 permutations in which the labels were randomly distributed over the samples.

For detection of features that contributed significantly to the SVM, we calculated statistical significance maps (p-maps). These maps were computed on all data using an analytical expression that approximates permutation testing [34]. Clusters of significant voxels were obtained by applying a slightly conservative p value threshold (α ≤ 0.01). We did not correct for multiple comparisons, as permutation testing has a low false-positive detection rate [35]. The clusters’ locations were identified by visual inspection.

Results

Participants

The inclusion of participants is visualised in Fig. 1. Table 2 shows the demographics and MMSE scores of the participants (24 AD, 33 FTD, 34 CN). Four patients were excluded because of poor ASL data quality, i.e. motion artefacts or noise. Included FTD disease subtypes were as follows: behavioural variant FTD (bvFTD, n = 12), PPA (n = 16, including ten with semantic dementia [SD] and four with progressive non-fluent aphasia [PNFA]), and five patients with unknown subtype. In the AD group, six patients had <1 year follow-up (range 0–7 months), and the diagnosis of 18 patients was confirmed by >1 year follow-up (range 12–45 months). In the FTD group, 12 patients had <1 year follow-up (range 0–11 months), and 21 patients had >1 year follow-up (range 12–47 months).

Fig. 1
figure 1

Flow of participants: a) patients with Alzheimer’s disease (AD), b) patients with frontotemporal dementia (FTD)

Table 2 Participant demographics

Classification results

Figure 2 shows the classification performance using T1w, ASL, and DTI voxel-wise features (Fig. 2a: AUC; 2b: accuracy). Table 3 shows non-parametric testing for significant differences between classifications.

Fig. 2
figure 2

Area under the ROC curve (AUC) (a) and accuracy (b). The error bars show the standard deviation of 50 iterations of fourfold cross-validation. An asterisk (*) indicates a significant improvement over the classification using VBM features only (permutation test, p ≤ 0.05)

Table 3 P values of the non-parametric permutation tests to test statistical differences between classifiers based on a) mean area under the ROC curve (AUC) and b) mean accuracy

For AD-CN classification, mean AUCs were 92% (VBM-GM), 87% (VBM-WM), 94% (VBM-Brain), 89% (CBF), 89% (FA), 95% (GM combination), 91% (WM combination), and 98% (Full combination). Classification accuracy was slightly lower than AUC in general. The performance using CBF and FA features was similar to that of the VBM features. The feature combinations yielded slightly higher performance than the VBM features, but differences were not significant.

For FTD-CN classification, AUCs using VBM were somewhat higher than for AD-CN, but combination with FA and CBF did not improve performance. AUCs were 95% (VBM-GM), 96% (VBM-WM), 95% (VBM-Brain), 87% (CBF), 91% (FA), 93% (GM combination), 95% (WM combination), and 96% (Full combination).

For differential diagnosis of AD versus FTD, AUCs were 78% (VBM-GM), 76% (VBM-WM), 72% (VBM-Brain), 81% (CBF), 80% (FA), 84% (GM combination), 81% (WM combination), and 84% (Full combination). Combination with CBF and FA features improved performance over the use of VBM features only.For multi-class diagnosis of AD, FTD, and CN, mean AUCs were 85% (VBM-GM), 83% (VBM-WM), 84% (VBM-Brain), 82% (CBF), 83% (FA), 87% (GM combination), 85% (WM combination), and 90% (Full combination). Classification accuracy was lower, but it should be noted that for this three-class diagnosis, the accuracy for random guessing would be only ~33%. For multi-class classification, AUCs were highest for the combination methods. The method that combined VBM-Brain with CBF and FA yielded a significantly higher AUC (90 vs. 84%, p = 0.03) and accuracy (75 vs. 70%, p = 0.05) than VBM-Brain by itself. This is reflected in the examples of confusion matrices for one iteration of the cross-validation (Appendix C; Table C1), which show a higher number of correctly classified patients and controls for Full combination than for VBM-Brain. However, combining VBM with ASL or DTI may also reduce the number of correctly classified patients, e.g. GM Combination has a lower number of correctly classified FTD patients than VBM-GM, while accuracy is improved.

Significance maps

Using SVM p-maps (Figs. 3, 4, and 5, Appendix B Figs. B1 and B2), we evaluated which voxels contributed significantly to the classifications. For VBM-GM (Fig. 3), we noted major influence of the perihippocampal region on the classifier; overall we observed a larger number of significant voxels in the left than in the right hemisphere. For differential diagnosis of AD-FTD, mainly voxels in the anterior temporal lobe were involved.

Fig. 3
figure 3

SVM significance maps for voxel-based morphometry of the grey matter (VBM-GM): a) AD-CN, b) FTD-CN, c) AD-FTD. Colour overlay shows p values ≤ 0.01

Fig. 4
figure 4

SVM significance maps for cerebral blood flow (CBF): a) AD-CN, b) FTD-CN, c) AD-FTD. Colour overlay shows p values ≤ 0.01

Fig. 5
figure 5

SVM significance maps for fractional anisotropy (FA): a) AD-CN, b) FTD-CN, c) AD-FTD. Colour overlay shows p values ≤ 0.01

For VBM-WM (Fig. B1), we observed most clusters of significantly contributing voxels in the temporal lobe and around the ventricles. For AD-CN and FTD-CN classification, a smaller cluster of significant voxels in the corpus callosum was found. The temporal lobe clusters were present mainly in the left hemisphere, especially for AD-FTD differentiation.

For VBM-Brain (Fig. B2), p-maps were very smooth as the feature is formed by the Jacobian determinant of the spatially smooth deformation to template space. Smoothness is lost in VBM-GM and VBM-WM by multiplying the Jacobian determinant with the probabilistic tissue segmentations. For AD-CN, the classification was driven mainly by periventricular and left temporal lobe features. For FTD-CN, the temporal lobe contributed with the largest clusters of significant voxels. For AD-FTD, small clusters were found in the middle frontal gyrus, temporal lobe and periventricular regions.

For CBF (Fig. 4), p-maps showed small clusters of significant voxels in multiple brain regions. For AD-CN, significant voxels were observed mainly in the GM of the parietal lobe, precuneus, posterior cingulate gyrus, posterior temporal lobe and the insula. For FTD-CN, the main regions with significant voxels were the posterior cingulate gyrus, superior frontal gyrus, the straight gyrus, lingual gyrus and the putamen. For AD-FTD, the classification relied mainly on voxels from the posterior cingulate gyrus, parietal lobe, caudate nucleus, insula, temporal lobe and the cuneus.

For FA (Fig. 5), clusters of voxels in the corpus callosum and around the globus pallidus and putamen contributed significantly to the AD-CN classification. In addition, clusters of voxels in the visual and motor tracts contributed. For FTD-CN, the clusters of significant voxels were observed mainly in the anterior temporal lobe, the frontal WM, the corpus callosum, and language-associated tracts (uncinate fasciculus, superior longitudinal fasciculus). For the differential diagnosis of AD-FTD, fewer voxels were significant with only a cluster of significant voxels in the uncinate fasciculus.

Discussion

Differential diagnosis of early-onset AD and FTD was improved (p = 0.03-0.05) by combining voxel-based features of ASL and DTI with those of structural MRI, however improvement was only borderline significant. For all classifications, ASL and DTI by themselves yielded performance similar to or slightly higher than structural MRI. While combining ASL and DTI with structural MRI improved differential diagnosis, no added value was observed for the classification of AD versus controls nor for the classification of FTD versus controls.

Classification performance was similar to that previously published on other data sets for pairwise differentiation of AD and FTD [8, 9], and slightly higher than that for multi-class classification [9]. The combination of ASL and DTI for classification of AD, FTD, and controls has not been assessed before, and therefore cannot be directly compared to literature results. The techniques have been applied separately to pairwise classifications. In concordance with our results, most studies using DTI obtained good classification performance [23, 24, 26, 27], but indicated no significant improvement over structural MRI [22, 25, 28]. In contrast to our current and previous work [19], most ASL-based classification studies showed a significant added value to structural MRI [13, 17, 18]. This is partly due to the higher performance of structural MRI in our studies. Additionally, not all studies avoid overestimation of classification performance by using cross-validation. For ASL, this overestimation might be larger than for structural MRI, because of lower signal-to-noise ratio and robustness. Conclusions obtained with or without cross-validation can therefore be expected to differ.

This work is, to the best of our knowledge, the first to perform multiparametric classification of structural MRI, ASL, and DTI. Multiparametric classification on other modalities has previously used feature-level combination (e.g. one large feature vector) or classifier-level combination (e.g. combining classifier posterior probabilities). In this study, we averaged posterior probabilities of the individual classifiers, since we had previously found this to outperform feature-level approaches [19].

The SVM significance maps showed that the brain regions contributing to the classifications corresponded to those associated with AD or FTD, which indicates that the classifier makes plausible decisions. For structural MRI, the temporal lobes showed large clusters of significant voxels. While the medial temporal lobe (i.e. hippocampus, amygdala) largely contributed to the classifications of AD versus controls and FTD versus controls, the differentiation between AD and FTD was based mainly on anterior temporal lobe features, which corresponds to the literature on atrophy in AD [27, 3639] and FTD [27, 39]. ASL and DTI showed less influence of the temporal lobe. In the frontal and language-associated regions, DTI contributed to the classifications involving FTD. While frontal atrophy is expected in FTD [27, 39], no frontal lobe contribution was observed. ASL p-maps showed significant areas in the parietal lobe for classifications involving AD [40]. While parietal lobe atrophy is often proposed as a differential marker [10, 27, 39], we did not find significant clusters in the VBM p-maps, which is in agreement with many VBM studies, e.g. [10, 41]. In addition to the parietal lobe, CBF in the cingulate gyri and subcortical structures—insula and caudate nucleus [40]—showed significant features for AD and FTD classification. Finally, DTI captured the contribution of the corpus callosum for all classifications [20, 21]. Since the clusters of voxels influencing the classifications showed different brain regions for ASL and DTI compared to structural MRI, neuropathological processes with a spatial distribution other than atrophy are likely to be depicted.

Both the improved performances for differential diagnosis and the involvement of different brain regions suggest that ASL and DTI have additional diagnostic value to structural MRI and could improve diagnosis of individual AD and FTD patients. However, suboptimal image quality of these techniques in general, e.g. low signal-to-noise ratio, may have limited their diagnostic power when used separately. Similar to our findings, studies using data from the Alzheimer's Disease Neuroimaging Initiative 2 (ADNI 2) have shown that ASL and DTI separately provide information that is not available on structural MRI, but do not show better diagnostic power [42].

A limitation of this study is that the diagnosis was based on clinical criteria rather than post mortem histopathological examination. Although diagnosis was typically confirmed by follow-up, it is possible that some of the patients were misdiagnosed. Additionally, the size of our data set (24 AD, 33 FTD, 34 controls) was modest albeit comparable to that of other studies. Studies performing classification of AD and FTD using structural MRI data are typically of similar size [9, 13] (only larger in [8]). To obtain these group sizes, we did not limit inclusion to young-onset dementia, but included five AD and six FTD patients who were older than 70 years. In young-onset dementia, computer-aided differential diagnosis of FTD and AD would be most clinically relevant, as these patients show larger overlap of symptoms [39]. Also, we pooled the patients of several FTD subgroups (bvFTD, SD, and PNFA), which could have influenced the classification results and the regions involved in classification. The modest data size did not allow for validation on a separate validation set; instead, cross-validation was used. In addition, potential vascular white matter damage in the AD group, e.g. infarcts and white matter hyperintensities, might have influenced the classification performance of DTI. However, we expect this effect to be small, as patients were excluded when they had a history of cerebrovascular accidents (CVA) or CVA reported in their MRI examination; additionally, they were relatively young.

Regarding these limitations and the results being only borderline significant, this study primarily has exploratory value. Future research on a larger and more specific presenile cohort is needed. To assess the generalisability of our conclusions, evaluation on multi-centre data and a separate validation set is necessary as well. With our current work, we presented a computer-aided diagnosis methodology based on structural MRI, ASL, and DTI which is ready to be evaluated on a larger data set when available.

In conclusion, we postulate that ASL and DTI are promising for multiparametric computer-aided diagnosis, since combining these techniques with structural MRI improved differentiation of early-onset AD and FTD in our study.